How to create and run your own w3c validation server

May 25, 2014

How to create and run your own w3c validation server

In my recent talk at the Tridion developer summit, I showed some techniques for automating the validation of your web pages during development. A key enabler for these techniques was having local instance of the w3c validator - after all, you don’t want to be firing repeated tests at, and during development you may not be able to, so you need your own server.

Fortunately, it’s not hard to set one up, so this blog post is a how-to - it should only take a couple of hours. I’m doing another one as I write this, just for the practice. Here goes:

I’m using VMware, and I’ve just created an empty image, configured to expect Centos Linux. I also have a download of the Centos DVD - CentOS-6.5-x86_64-bin-DVD1.iso is what I’m using, so if you use anything else YMMV. Instead of the default 20GB that VMware suggests for the disk, be brave and dial it right back to 5GB. By the way - don’t let VmWare do the install for you - you’ll get a GUI and a whole bunch of other stuff you don’t need, and there’s fat chance of fitting it in 5GB. (Remember when 5GB would have been huge?!)

If you’re going to run this on your laptop with everything else, you might as well also click “Customize hardware” and dial the memory back
to half a gig. While you’re in the settings, hook up your iso to the DVD drive, and then just start it up.

Step through the Centos installer, and just make the obvious choices… so partition the entire drive, etc., etc. At some point comes a reboot, and you can log in again. Here, you might want to create some accounts and so forth. For the sake of brevity, I’m just going to assume that the only login will be root - at least you should be able to remember the password from a minute or two ago!!

My editor of choice in situations like this is vim, which comes loaded with the installation you just did. If another editor suits you better, then you can yum for it.

Anyway - Centos is configured by default to be secure, which means you need to configure the network interface to come up on boot.

vi /etc/sysconfig/network-scripts/ifcfg-eth0

and change onboot to yes, save the file, and then reboot (shutdown -r now)

After rebooting, you may wish to install vmware tools - if only to stop it nagging you. Like so

mkdir /mnt/cdrom
mount /dev/cdrom /mnt/cdrom
cp /mnt/cdrom/VMwareTools-9.2.4-1398046.tar.gz ~ 
tar -xzf VMwareTools-9.2.4-1398046.tar.gz
yum install perl

Accept the defaults, restart and log in as root

The w3c markup validator isn’t available in the standard repositories that Centos already knows about. To get access to it, you now need to install EPEL (Extra packages for Enterprise Linux). The following command worked for me:

rpm -Uvh

You are now ready to install the validator, as follows:

yum install w3c-markup-validator

At this point I also did a “yum update” on the grounds that being up-to-date is usually good.

We now need to set up the HTML5 validator and configure the w3c validator to use it when appropriate. Here are the steps:

yum install java-1.6.0-openjdk-devel
yum install python subversion git mercurial
mkdir /usr/share/html5-checker
cd /usr/share/html5-checker
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64/
hg clone build
python build/ all
python build/ all

Note that the last line is deliberately a repeat. The first time this runs, it will fail, but the second time should run OK.

This will leave the validator running on port 8888. If you want to see it running, you’ll need to configure the firewall to allow it. In addition, you will find that secure linux is installed by default, which will also cause you some problems. So the next step is to configure iptables and selinux in a way that is appropriate for your environment. For example, the security configuration appropriate for the image I’m building right now is as follows:

/etc/init.d/iptables stop
/sbin/chkconfig/iptables off
vi /etc/sysconfig/selinux (change the SELINUX setting to 'disabled')

To see it working you need to start apache:

apachctl start

and to make this the default on start:

/sbin/chkconfig httpd on

You can now see the validator running on port 80 at /w3c-validator/ (don’t forget the final slash)

OK - it’s running, and so is the HTML5 validator, but the w3c validator needs to know how to hand off HTML validation requests properly. Edit the following file:


Navigate to the bottom of the file and uncomment the line that begins with “HTML”

It’s also handy to find the “Allow Private IPs” setting and set it to ‘yes’.

We’re almost there. Everything’s configured as we want it, and apache will start automatically. All that remains is to ensure that the HTML5 checker also starts automatically. To do this, create the file
“/etc/init/html5-validator.conf” and edit the content to
be as follows:

description " HTML5 Validator"
start on runlevel [234]
stop on runlevel [0156]
chdir /usr/share/html5-checker
exec python build/ --control-port=8889 run

After rebooting the image, you should find that your server is now working. You may wish to give it a name in your DNS, and it’s also useful if the servers you’ll be validating are named in a DNS that’s available to the validator. In practice, if I have a development server that’s not going to be in the company internal DNS, I find it just as easy to add it in /etc/hosts on the validator.

With this in place, you now have nothing stopping you from automatically executing HTML5 validation and ensuring the right quality right through the development process. It’s definitely worthwhile to know about invalid HTML as soon as you create it. If you have to fix the same bug a week or a month later, the chances that you still fully understand the code will have greatly diminished. Happy validating.

Dominic Cronin

Dominic Cronin