The W3C is maintaining a HTML validator service at The W3C Validator. If you try it using
various URLs from pages on the Internet that were created using the methods described in this document (i.e. DocBook SGML, openjade etc.), you may be surprised that so many of them will return the
following error from W3C when validated
:
I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. |
This is because their authors did not take the trouble to examine the created HTML documents and enhance them to conform to the HTML standards. There is, in fact, some amount of work involved, if you want your HTML documents to obey the standards set by the W3C - but this work too is automated in the scripts presented here! Let's have a look how this is done:
The above error from the W3C HTML Validator comes from the fact that the documents, as produced by openjade and the configuration settings discussed so far, do not include something like
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
But even if they did, the all-important DOCTYPE statement
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
would also be missing, making validation against a HTML DTD impossible. This may be a deliberate “feature” of the tools involved in the document creation chain. But it also may have its root to an option that went unnoticed by me throughout the time! If you happen to know of such an option (perhaps in the HTML stylesheet?), please don't hesitate to contact me.
The way I decided to close this gap is an idea I borrowed from Hugo van der Kooij while reading his document on how to setup your own docbook processing: after the HTML document has been created, proceed by extracting 2 parts out of it, the title part and the body part, both stored in separate temporary files called title.tmp and body.tmp respectively. This splitting part is done by an awk script called htmlsplit.awk. You should also have created three text files
your replacement HTML code up until the <title> tag - this is what we will call part1,
the title part, containing the title (or even some navigational menu structure specific to your website, but we will not pursue it here further, see Hugo's original document for this) - this is what we will call part2,
the footer part - which we will call part3.
Part1 looks like
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title> |
so that is where the DOCTYPE statement goes! Part2 contains:
</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body bgcolor="#FFFFFF"> |
so there is where the encoding information goes! I have also set the background colour to fit my site's design. Part3 contains
<table width="100%"> <tr align="center"> <td valign="middle"> <a href="http://validator.w3.org/check?ss=1&sp=1&uri=http%3A%2F%2F_DOMAIN_%2F_DIRNAME_%2F_FILENAME_"> <img border="0" src="images/valid-html401.png" alt="Valid HTML 4.01!" height="31" width="88"></a> </td> <td valign="middle"> <a href="http://counter.li.org"> <img border="0" src="images/linux_user_314103.png" alt="Linux User #314103"></a> </td> <td valign="middle"> <a href="http://www.anybrowser.org/campaign/"> <img border="0" src="images/w3c_ab.png" alt="Best viewed with ANY browser!"></a> </td> </tr> </table> </body> </html> |
and, as you can easily see, is a customized footer. Now we proceed to intermix all the above files in the following order into one HTML document:
part1
title
part2
body
part3
At the end we get a HTML document that is customized to the design of our site and can easily be checked for compliance to the HTML standards!
![]() |
HTML Parameters and Chunking | |
|---|---|---|
|
You can achieve a similar result by setting the html-header-tags parameter accordingly in the HTML DSSSL stylesheet (Section 4.2, Section 7.1.5). The html-header-tags parameter should contain a list of the the HTML HEAD tags that should be generated. The format is a list of lists, each interior list consists of a tag name and a set of attribute/value pairs: '(("META" ("NAME" "name") ("CONTENT" "content"))).
Of course, you would have to change the html-header-tags parameter in the .dsl file each time before processing a new document. You would thus need some kind of placeholders that could be identified and changed with sed to the actually needed values. However, this amounts to the same effort that we are currently investing with our method, which also substitutes various placeholders in parts 1-3. Obviously, such a flexibility must come at some cost. Feel free to experiment with the other HTML Parameters for Chunking too! |
The footer file, part3, deserves some extra attention, since it illustrates the kind of customization and control over your HTML output that you can achieve with this method: it contains the HTML code that prints three icons in a row - a W3C HTML validation icon, a Linux Counter icon and an icon from the "any browser" campaign. There are three placeholders in the link for the HTML validation icon: _DOMAIN_, _DIRNAME_ and _FILENAME_. These are substituted on-the-fly (using sed one-line commands) with the domain, directory and filename respectively of the file whose footer we are currently processing:
$SED -e "s/_DOMAIN_/$DOMAIN/g" ${DATADIR}/part3 > part3_1.tmp
$SED -e "s/_DIRNAME_/$1/g" part3_1.tmp > part3_2.tmp
$SED -e "s/_FILENAME_/${BASENAME}/g" part3_2.tmp > part3.tmp
|
The result is an icon that, when clicked, will automatically pass the URI of the current file to The W3C Validator for HTML validation!
![]() |
Please note: | ||
|---|---|---|---|
|
A file containing graphical callouts (see Section 4.8 and Section 5.9) will NOT be validated! You will get an error saying
Also note that admonitions (see Section 4.7 and Section 5.8) will be positioned using valign="middle", instead of the right one valign="middle", thus leading again to non validation. But this is easily corrected by the sed script sedscr_val that is run near the end of the lyxtox script. |
| Last updated Mon Sep 24 01:19:25 CEST 2007 | Permalink: http://www.karakas-online.de/mySGML/html-validation.html | All contents © 2002-2007 Chris Karakas |