In the main part, the hard work (for your computer) begins:
The document is exported from LyX to DocBook SGML:
$LYX -e docbook $1.lyx |
The SGML that is produced by LyX has several shortcomings. They have to be corrected. This is done by calling runsed:
$RUNSED $SEDSCR $1.sgml |
which is the subject of the next subsection.
![]() |
Alternative commands |
|---|---|
|
There are quite a few alternative invocations of the various tools at appropriate places in the lyxtox script, in the form of comments. These are there in order to show you how you can achieve an equivalent result through other tools. |
Runsed takes as argument the sedscript to run and the file against which to run it. It calls sed with sedscr as the “sed command file”. In the sedscr file itself there is another bunch of “magic” going on:
![]() |
Important note: |
|---|---|
|
The changes in LyX' SGML code presented here pertain strictly to LyX version 1.2.0! The 1.1.x versions needed slightly (and subtly) different changes and the same may be true for future versions of LyX. Examples of sed commands for previous LyX versions are presented in the sedscr file in comments. Use (or construct) the right sed commands for the right changes for your LyX version! The success of this method depends crucially on this. |
The code
s/<\(sect[^>]*\)>\(<title>[^<]*\)<anchor \([^>]*\)>/<\1 \3>\2/g s/<\(chapter\)>\(<title>[^<]*\)<anchor \([^>]*\)>/<\1 \3>\2/g |
tells sed to substitute[1]
< sect1 >< title > some title < anchor id="some label" > |
with
<sect1 id="some label" ><title>some title |
and
< chapter >< title > some title < anchor id="some label" > |
with
<chapter id="somelabel"><title> some title |
The code
/^.*<figure><title><graphic/{
s/<figure><title><graphic fileref="\([^"]*\)">[ ]*<anchor id="\([^"]*\)">\([^<]*\)<\/title>[ ]*<\/figure>/\
<figure id="\2">\
<title>\
\3\
<\/title>\
<mediaobject>\
<\!\[ \%output\.print\.png; \[\
<imageobject>\
<imagedata fileref="\.\/images\/\1.png" format="PNG">\
<\/imageobject>\
\]\]>\
<\!\[ \%output\.print\.pdf; \[\
<imageobject>\
<imagedata fileref="\1.pdf" format="PDF" scale="65">\
<\/imageobject>\
\]\]>\
<\!\[ \%output\.print\.eps; \[\
<imageobject>\
<imagedata fileref="\1.eps" format="EPS">\
<\/imageobject>\
\]\]>\
<\!\[ \%output\.print\.bmp; \[\
<imageobject>\
<imagedata fileref="\1.bmp" format="BMP">\
<\/imageobject>\
\]\]>\
<textobject>\
<phrase>\3<\/phrase>\
<\/textobject>\
<caption>\
<para>\3<\/para>\
<\/caption>\
<\/mediaobject>\
<\/figure>\
/g
}
|
tells sed to substitute[2]
< figure >< title >< graphic fileref="imagename" > some blanks < anchor id="some id" >some title< /title > |
with the more elaborate combination of figure and mediaobject elements:
<figure id="some id">
<title>
some title
</title>
<mediaobject>
<![ %output.print.png; [
<imageobject>
<imagedata fileref="./images/imagename.png" format="PNG">
</imageobject>
]]>
<![ %output.print.pdf; [
<imageobject>
<imagedata fileref="imagename.pdf" format="PDF" scale="65">
</imageobject>
]]>
<![ %output.print.eps; [
<imageobject>
<imagedata fileref="imagename.eps" format="EPS">
</imageobject>
]]>
<![ %output.print.bmp; [
<imageobject>
<imagedata fileref="imagename.bmp" format="BMP">
</imageobject>
]]>
<textobject>
<phrase>some title</phrase>
</textobject>
<caption>
<para>some title</para>
</caption>
</mediaobject>
</figure>
|
There are some remarks due here:
The title of the original SGML appears in three places of the new SGML: the title, the phrase for the alternative text and the caption. LyX uses the figure caption for the title and there is no way we can derive three different texts for the three different uses in the new SGML. That is why in the output document the figure title, the alternative text and the figure caption are identical.
For the PNG format we must prefix the image file name, imagename.png, with the relative path (./images) to it, even though we set all environment variables correctly (see Section 7.1.3). This is not necesary for the other formats.
We scale the PDF images to 100%. I used to scale them down to 65%, but this is no longer necessary, after some experimentation with various scale factors that seem to compensate for this need in the addd utility. See also Section 4.9.
We make use of external SGML entities like %output.print.png; This is a topic of its own which is explained in detail in Section 7.2.2.
A mediaobject similar to the above (but without figure id and caption) is inserted whenever a “simple” image, i.e. one without the float element with caption, is encountered in LyX' SGML. It substitutes a line like[3]
< graphic fileref="imagename" > |
with a mediaobject like
<mediaobject>
<![ %output.print.png; [
<imageobject>
<imagedata fileref="./images/imagename.png" format="PNG">
</imageobject>
]]>
<![ %output.print.pdf; [
<imageobject>
<imagedata fileref="imagename.pdf" format="PDF" scale="65">
</imageobject>
]]>
<![ %output.print.eps; [
<imageobject>
<imagedata fileref="imagename.eps" format="EPS">
</imageobject>
]]>
<![ %output.print.bmp; [
<imageobject>
<imagedata fileref="imagename.bmp" format="BMP">
</imageobject>
]]>
</mediaobject>
|
Notice that the text is now simply "Figure", since there was no caption. You may change it to something else. There is also no id available for this mediaobject, therefore you cannot cross-reference it. That's why I suggested floats in Section 5.7.
The following sed code
/^.*[^<]*<programlisting/s/<programlisting\([^>]*\)>/<screen\1>/g /^.*[^<]*<\/programlisting>/s/<\/programlisting\([^>]*\)>/<\/screen\1>/g |
will substitute <programmlisting> with <screen>, while this one:
# Delete the <para> before the <tgroup> tag. s/<tgroup/<tgroup/g # Delete the </para> after the </tgroup> tag. s/<\/tgroup><\/para>/<\/tgroup>/g |
will delete <para> before <tgroup> and </para> before </tgroup>.
For table captions and titles to be output correctly, you have to eliminate the <para> from any sequence </title><para><tgroup> AND you have to write a table float (see Section 5.10, in the inside of which you will have to set the title and the caption environment on one line, then press <enter>, set the environment to "Standard" (this will produce the <para> element we eliminate here) and continue with the table normally. A warning about an "end tag for element "TABLE" which is not open" is the less evil we can get and is harmless (a LyX bug in 1.2.0, not openjade's):
/<\/title><tgroup/s/<\/title><tgroup/<\/title><tgroup/ |
Further, for the cross-references to tables to work, we have to substitute[4]
< table >< title >< anchor id="some id" > |
with
<table id="some id"><title> |
This is done with the following sed code:
s/<table>[ ]*<title>[ ]*<anchor \([^>]*\)>/<table \1><title>/g |
Some minor issues still remain:
Substitute 'ldquo' with 'quot' and 'rdquo' with 'quot':
s/\“/\"/g s/\”/\"/g |
But we are not done with the quots yet: In <othercredit> we have to substitute[5]
& quot ; |
with " :
/<othercredit/s/\"/"/g |
And: substitute[6]
& amp ; copy ; |
with ©. This will produce a Copyright symbol, instead of "©":
s/\©/\©/g |
Also, substitute &xxxx; with the character it representes - somehow these entities do not work:
s/[/[/g
s/]/]/g
s/{/{/g
s/}/}/g
s/$/$/g
s/%/%/g
s/#/#/g
s/|/|/g
s/£/£/g
s/_/_/g
s/\/\\/g
s/~/~/g
|
Finally, for the index to be created, we have to insert the index creation command. Comment this if you don't want an index: to substitute[7]
< /book > |
with[8]
&index; < /book > |
the following sed code is needed:
/<\/book>/s/<\/book>/\&index;\ <\/book>/ |
A similar sed code is there for the article document type. Currently, this part has been commented and transfered to the sedscr_abi script which inserts the entities for the Appendix, the Bibliography and the Index at once at the end of the document, before the closing </book> or </article> tags.
Finally, two calls to runsed with sedscr_tidy and sedscr_tidy2 as the script files will “ tidy up” the SGML file:
$RUNSED $SEDSCRTIDY $1.sgml $RUNSED $SEDSCRTIDY2 $1.sgml |
sedscr_tidy consists simply of the following lines:
# Author: Chris Karakas
# http://www.karakas-online.de
#
# Part of the LyX-to-X project.
# See http://www.karakas-online.de/mySGML/ for a detailed
# description.
#
# Copyright (c) 2004, Chris Karakas
# http://www.karakas-online.de
# chris at mydomain dot de (see above for my domain)
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; see the file COPYING. If not, write to
# the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
{
s/\([^\n]\)</\1\
</
s/>\([^\n]\)/>\
\1/
P
D
}
|
It does practically nothing else than insert a newline before an opening bracket (a "<") or a closing one (a ">"). After this transformation has taken place for the whole document, runsed is called with sedscr_tidy2 as the sed script:
# Author: Chris Karakas
# http://www.karakas-online.de
#
# Part of the LyX-to-X project.
# See http://www.karakas-online.de/mySGML/ for a detailed
# description.
#
# Copyright (c) 2004, Chris Karakas
# http://www.karakas-online.de
# chris at mydomain dot de (see above for my domain)
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; see the file COPYING. If not, write to
# the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
/<entry/{
N
s/<entry \([^>]*\)>[ \n\t]*<para>/<entry \1><para>/
p
d
}
/<\/para/{
N
s/<\/para>[ \n\t]*<\/entry>/<\/para><\/entry>/
p
d
}
/DATA/{
N
s/\n]]>/]]>/
P
D
}
/<citation/{
N
N
s/\n//g
p
d
}
|
sedscr_tidy2 will do some corrections, because the changes of sedscr_tidy went a bit too far:
It will delete any newline before the closing "]]>" of CDATA elements
and it will bring <entry> and <para> elements on the same line, i.e. it will delete any newlines between them. It will do the same for the closing </para> and </entry> pairs. Both pairs occur inside tables and due to the Pernicious Mixed Content Problem they should not contain any line feed or space in-between, or the parser will think that the table cell contains inline markup and issue the warning that the “ document type does not allow element <para>” at that place. See the discussion of the “ document type does not allow element <para>” error in Section 6.2, as well as in Openjade error: <para> not allowed after <entry>.
![]() |
Tidy scripts mess up code snippets |
|---|---|
|
The tidy scripts will mess up any part of your file that contains the < and > brackets. Especially code that is included verbatim (i.e. without the use of some external entity) and contains such brackets will look awkward. Callouts will also be affected. I have deactivated the call to the scripts in the lyxtox file until I find a better solution (BTW, nsgmls will break with errors so it does not lend itself to SGML code tidying either). If potentially affected code is included with the help of an external entity though, then the tidy scripts might work fine for you. |
The sedscr file also contains code that will add markup for key combinations:
# Key combinations
#
# CTRL-X-Y
s/\([^-]\)CTRL-\([^-. &<]*\)-\([^-. &,)<]*\)/\1\
<keycombo>\
<keycap>CTRL<\/keycap>\
<keycap>\2<\/keycap>\
<keycap>\3<\/keycap>\
<\/keycombo>\
<indexterm>\
<primary>CTRL_\2_\3<\/primary>\
<\/indexterm>\
/g
|
The above code, for example, will substitute every occurence of the string “ CTRL-X-Y ” with:
<keycombo>
<keycap>CTRL</keycap>
<keycap>X<\/keycap>
<keycap>Y<\/keycap>
</keycombo>
<indexterm>
<primary>CTRL_X_Y</primary>
</indexterm>
|
thus adding the right DocBook markup for the key combination and also an index entry for it too. There is also code for “ CTRL-X ” or only “CTRL”. Instead of “CTRL”, you can also have “ESC” or “ALT” - there is code for them too. The user can thus just write “ CTRL-ALT-DEL ” or “ESC” or “ ALT-F4 ” and the scripts will take care of markup and indexing.
There is also automatic markup insertion for acronyms, product names and applications:
For acronyms:
# Acronyms # PHP, GNU, EOF, Python, POSIX, GUI, LDP... s/\([ .\t\r\n]\)PHP\([ .\t\r\n]\)/\1<acronym>PHP<\/acronym>\2/g s/\([ .\t\r\n]\)GNU\([ .\t\r\n]\)/\1<acronym>GNU<\/acronym>\2/g s/\([ .\t\r\n]\)EOF\([ .\t\r\n]\)/\1<acronym>EOF<\/acronym>\2/g s/\([ .\t\r\n]\)Python\([ .\t\r\n]\)/\1<acronym>Python<\/acronym>\2/g s/\([ .\t\r\n]\)POSIX\([ .\t\r\n]\)/\1<acronym>POSIX<\/acronym>\2/g s/\([ .\t\r\n]\)GUI\([ .\t\r\n]\)/\1<acronym>GUI<\/acronym>\2/g s/\([ .\t\r\n]\)LDP\([ .\t\r\n]\)/\1<acronym>LDP<\/acronym>\2/g s/\([ .\t\r\n]\)IDE\([ .\t\r\n]\)/\1<acronym>IDE<\/acronym>\2/g s/\([ .\t\r\n]\)RPM\([ .\t\r\n]\)/\1<acronym>RPM<\/acronym>\2/g s/\([ .\t\r\n]\)PGP\([ .\t\r\n]\)/\1<acronym>PGP<\/acronym>\2/g s/\([ .\t\r\n]\)GPG\([ .\t\r\n]\)/\1<acronym>GPG<\/acronym>\2/g s/\([ .\t\r\n]\)ID\([ .\t\r\n]\)/\1<acronym>ID<\/acronym>\2/g s/\([ .\t\r\n]\)BW\([ .\t\r\n]\)/\1<acronym>BW<\/acronym>\2/g s/\([ .\t\r\n]\)ASCII\([ .\t\r\n]\)/\1<acronym>ASCII<\/acronym>\2/g s/\([ .\t\r\n]\)CPU\([ .\t\r\n]\)/\1<acronym>CPU<\/acronym>\2/g |
For product names:
# Product names # UNIX, Linux, Acrobat, Windows... s/\([ .\t\r\n]\)UNIX\([ .\t\r\n]\)/\1<productname>UNIX<\/productname>\2/g s/\([ .\t\r\n]\)Linux\([ .\t\r\n]\)/\1<productname>Linux<\/productname>\2/g s/\([ .\t\r\n]\)Acrobat\([ .\t\r\n]\)/\1<productname>Acrobat<\/productname>\2/g s/\([ .\t\r\n]\)Windows\([ .\t\r\n]\)/\1<productname>Windows<\/productname>\2/g s/\([ .\t\r\n]\)Mandrake\([ .\t\r\n]\)/\1<productname>Mandrake<\/productname>\2/g s/\([ .\t\r\n]\)SuSE\([ .\t\r\n]\)/\1<productname>SuSE<\/productname>\2/g |
For applications:
# Applications # TeX, LaTeX, Acrobat Reader, PHP-Nuke s/\([ .\t\r\n]\)TeX\([ .\t\r\n]\)/\1<application>TeX<\/application>\2/g s/\([ .\t\r\n]\)LaTeX\([ .\t\r\n]\)/\1<application>LaTeX<\/application>\2/g s/\([ .\t\r\n]\)Reader\([ .\t\r\n]\)/\1<application>Reader<\/application>\2/g s/\([ .\t\r\n]\)PHP-Nuke\([ .\t\r\n]\)/\1<application>PHP-Nuke<\/application>\2/g s/\([ .\t\r\n]\)Perl\([ .\t\r\n]\)/\1<application>Perl<\/application>\2/g s/\([ .\t\r\n]\)Java\([ .\t\r\n]\)/\1<application>Java<\/application>\2/g |
The principle is the same for all three: substitute any occurence of the acronym, product name or application with the appropriate DocBook markup. For example, an string “GNU” is replaced by:
<acronym>GNU</acronym> |
(GNU is an acronym), while a “Linux” is replaced by:
<productname>Linux</productname> |
(Linux is a product name in this case) and “TeX” (an application) is replaced by
<application>TeX</application> |
It is up to you which acronyms, product names or applications you want to have automatically marked up this way (they will appear in small caps if you didn't change anything in the standard stylesheets), so feel free to add or remove your favourites from the code in sedscr.
This concludes the transformation of LyX' SGML (specific parts were not covered here - for the Mathematics part, see Section 10.3.1, for the citations part, see Section 7.1.10.2). runsed copies the original SGML file to a backup file with the .bak ending, then writes the output of sed to a temporary file with a random name, compares the two files and, if something has changed, replaces the original file with the temporary one, otherwise outputs a warning that the file did not change.
Before we continue, we have to take care of a problem that seems to be caused by a bug in the DSSSL stylesheets used: although we add a <phrase> element to the <textobject>'s (see the code in sedscr), it seems that it is not used for alt attributes in the resulting images during HTML creation. This makes the resulting HTML documents fail the HTML validation test of the W3C (see Chapter 8).
I have decided to resolve this problem with another sed script, this time a dynamic one! First, the SGML file of the document is passed to sed using sedscr_ima:
${SED} -n -f ${SEDSCRIMA} $1.sgml > ${SEDSCRIMG}
|
This produces a sed script (${SEDSCRIMG) called sedscr_img. We create a second sed script, ${SEDSCRGRA}, which we will use in order to substitute "graphXXXX" with the real name of the graphic file in ${SEDSCRIMG:
${SED} -n -e '/<\!ENTITY/s/.*graph\([^ ]*\) "\([^>]*\)".*>/s\/graph\1\/\2\/g/p' $1.sgml > ${SEDSCRGRA}
|
We now use the sed script ${SEDSCRGRA} (sedscr_gra) to substitute "graphXXXX" with the real name of the graphic file in ${SEDSCRIMG}[9]:
${RUNSED} ${SEDSCRGRA} ${SEDSCRIMG}
|
This last step transforms sedscr_img, but it stil contains <acronym>, <productname> and <application> tags. To erase them from the alt and title texts in the sed script ${SEDSCRIMG}, we use the sed script sedscr_apa:
${RUNSED} ${SEDSCRAPA} ${SEDSCRIMG}
|
Finally, we add the necessary sed commands for the alt and title texts of smilies.
echo 's/"\.\/images\/icon_smile\.png">/".\/images\/icon_smile.png" alt="smile" title="smile">/g'
>> ${SEDSCRIMG}
echo 's/"\.\/images\/icon_wink\.png">/".\/images\/icon_wink.png" alt="wink" title="wink">/g'
>> ${SEDSCRIMG}
echo 's/"\.\/images\/icon_cool\.png">/".\/images\/icon_cool.png" alt="cool" title="cool">/g'
>> ${SEDSCRIMG}
echo 's/"\.\/images\/icon_eek\.png">/".\/images\/icon_eek.png" alt="shock" title="shock">/g'
>> ${SEDSCRIMG}
echo 's/"\.\/images\/icon_frown\.png">/".\/images\/icon_frown.png" alt="frown" title="frown">/g'
>> ${SEDSCRIMG}
|
Now we have computed a sed script, ${SEDSCRIMG}, that adds alt and title tags to the images in every HTML file that is applied on.
Here's how it looks like for this document:
s/<img src="\.\/images\/general-info\.png">/<img src=".\/images\/general-info.png" alt="General document info." title="General document info.">/g s/<img src="\.\/images\/paper-sizes\.png">/<img src=".\/images\/paper-sizes.png" alt="ISO-DIN paper sizes." title="ISO-DIN paper sizes.">/g s/<img src="\.\/images\/insert-url\.png">/<img src=".\/images\/insert-url.png" alt="Insert URL with underscores in LyX." title="Insert URL with underscores in LyX.">/g s/<img src="\.\/images\/page-area-model\.png">/<img src=".\/images\/page-area-model.png" alt="CSS page area model." title="CSS page area model.">/g s/<img src="\.\/images\/fonts\.png">/<img src=".\/images\/fonts.png" alt="Document Info: Fonts." title="Document Info: Fonts.">/g s/"\.\/images\/icon_smile\.png">/".\/images\/icon_smile.png" alt="smile" title="smile">/g s/"\.\/images\/icon_wink\.png">/".\/images\/icon_wink.png" alt="wink" title="wink">/g s/"\.\/images\/icon_cool\.png">/".\/images\/icon_cool.png" alt="cool" title="cool">/g s/"\.\/images\/icon_eek\.png">/".\/images\/icon_eek.png" alt="shock" title="shock">/g s/"\.\/images\/icon_frown\.png">/".\/images\/icon_frown.png" alt="frown" title="frown">/g |
We will use it in a moment...
After some cleaning
# Clean previous HTML files. rm $1/*.html # Clean previous image files. rm -rf $1/images # Clean rsync backup copies. rm -rf $1/*~ |
the document creation begins:
For the one HTML file, the steps are:
Index initialization (-N option):
$PERL $COLLATEINDEX -N -o index.sgml |
Create one HTML file. We use openjade for that. Older versions used sgmltools as follows:
$SGMLTOOLS -b onehtml -s $HTML_NOCHUNKS_DSL -j "-i output.print.png -V nochunks -V html-index" $1.sgml |
Notice that we pass "-i output.print.png -V nochunks -V html-index" to openjade through the -j option. Current versions use openjade directly:
${OPENJADE} -t sgml -d $HTML_NOCHUNKS_DSL -i output.print.png -V nochunks -V html-index $1.sgml > $1.html
|
The -i option to openjade tells it to include the output.print.png entity (see the structure of the mediaobjects put in place by runsed above), while the preample (see Section 4.6) tells it to ignore all such entities. Since the command line option overrides all others, the output.print.png entity is included for the HTML output, while the othe ones are ignored (see also Section 7.2.2).
Index creation:
$PERL $COLLATEINDEX -g -o index.sgml HTML.index |
Generation of one HTML file (the index will be included). Again, older versions used sgmltools:
$SGMLTOOLS -b onehtml -s $HTML_NOCHUNKS_DSL -j "-i output.print.png" $1.sgml |
but newer ones use openjade:
${OPENJADE} -t sgml -d $HTML_NOCHUNKS_DSL -i output.print.png -V nochunks $1.sgml > $1.html
|
Tidy the HTML code:
$TIDY -ascii -c -wrap 200 -f /dev/null -m $1.html |
Correct header and footer. First, split the HTML document in title and body parts. The title will be put in title.tmp, the body in body.tmp:
$HTMLSPLIT < $1.html |
Second, put the right header and footer in the file (see Chapter 8):
HTMLFILE=$1.html
BASENAME=`basename $HTMLFILE`
cat ${DATADIR}/part1 > ${HTMLFILE}
cat title.tmp >> ${HTMLFILE}
echo '</title>' >> ${HTMLFILE}
cat meta.tmp >> ${HTMLFILE}
|
Substitute the placeholders DOMAIN, DIRNAME, FILENAME etc. in the header (part2) and footer file (part3) with the current values:
# Header
${SED} -e "s/_DOMAIN_/${DOMAIN}/g" ${DATADIR}/part2 > part2_1.tmp
${SED} -e "s/_DIRNAME_/$1/g" part2_1.tmp > part2_2.tmp
${SED} -e "s/_FILENAME_/${BASENAME}/g" part2_2.tmp > part2_3.tmp
${SED} -e "s/_TITLE_/${TITLE}/g" part2_3.tmp > part2_4.tmp
${SED} -e "s/_FORMATSFILE_/${FORMATSFILE}/g" part2_4.tmp > part2_5.tmp
${SED} -e "s/_COPYRIGHT_/${COPYRIGHT}/g" part2_5.tmp > part2_6.tmp
${SED} -e "s/_HOMEFILE_/${HOMEFILE}/g" part2_6.tmp > part2_7.tmp
${SED} -e "s/_DATE_/${TODAY}/g" part2_7.tmp > part2.tmp
cat part2.tmp >> ${HTMLFILE}
# Body
cat body.tmp >> ${HTMLFILE}
# Footer
${SED} -e "s/_DOMAIN_/${DOMAIN}/g" ${DATADIR}/part3 > part3_1.tmp
${SED} -e "s/_DIRNAME_/$1/g" part3_1.tmp > part3_2.tmp
${SED} -e "s/_FILENAME_/${HTMLFILE}/g" part3_2.tmp > part3_3.tmp
${SED} -e "s/_TITLE_/${TITLE}/g" part3_3.tmp > part3_4.tmp
${SED} -e "s/_FORMATSFILE_/${FORMATSFILE}/g" part3_4.tmp > part3_5.tmp
${SED} -e "s/_COPYRIGHT_/${COPYRIGHT}/g" part3_5.tmp > part3_6.tmp
${SED} -e "s/_HOMEFILE_/${HOMEFILE}/g" part3_6.tmp > part3_7.tmp
${SED} -e "s/_DATE_/${TODAY}/g" part3_7.tmp > part3.tmp
cat part3.tmp >> ${HTMLFILE}
|
If you have set the values of TITLE, FORMATFILE, HOMEFILE etc. in your .start file (see Section 4.11) and you use them somewhere in your part* files, they will be replaced with those values too. The same is true for DATE: you can use it in your headers and footers to produce the timestamp automatically.
Add alt and title attributes to images. We use the sed script sedscr_img, that we computed in Section 7.1.4.5:
# Add alt and title tags to the images.
${RUNSED} ${SEDSCRIMG} ${HTMLFILE}
|
Finally, do some housekeeping, removing all intermediate files:
# Housekeeping rm -f body.tmp title.tmp meta.tmp part2*.tmp part3*.tmp |
For the HTML output with many files (chunks), the procedure is analogous to the above, so I will not repeat it here.
For the print formats, the index is recreated (we need page numbers instead of HTML links). Notice the -p option to collateindex and the use of the saved copy of HTML.index in the current directory - this is because the raw index data are generated with the HTML stylesheet, even for the print formats (see Section 7.1.11):
rm index.sgml $PERL $COLLATEINDEX -p -g -o index.sgml HTML.index |
and the images directory is copied under $1 (the myTemplate directory, in our example):
cp -av images $1/ |
For the PDF output, the steps are:
Generate the PDF document in a first pass:
$OPENJADE -t tex -d $PRINT_PDF_DSL -o $1.tex -i "output.print.pdf" $1.sgml |
Notice that now only the output.print.pdf entities in the mediaobjects are included (see the discussion of this for the HTML output above, as well as in Section 7.2.2).
The generated PDF in the 1st pass does not have thumbnails yet. Generate thumbnails now (do not confuse the script THUMB_PDF (thumbpdf) with the environment variable THUMBPDF, which passes additional options to the THUMB_PDF script ):
$THUMB_PDF $1 |
Generate PDF again (2nd pass), to incorporate the thumbnails. The following two commands are equivalent to
$SGMLTOOLS -b pdf -s sgmltools-pdf -j "-i output.print.pdf" $1.sgml |
(up to the use of the stylesheet, that is), but we have to run them separately, because otherwise we cannot process the file produced by thumbpdf - sgmltools will always want a filename of the form @jobname.tpt, where jobname is its PID (process id) (difficult to guess...). On the other side, thumbpdf will produce $1.ptp. So we have to "simulate" sgmltools with the following two commands:
This will produce a tex file from the SGML source:
$OPENJADE -t tex -d $PRINT_PDF_DSL -o $1.tex -i "output.print.pdf" $1.sgml |
This will produce a PDF file from the tex file. This PDF file will have thumbnails!
$PDFJADETEX $1.tex |
We must call pdfjadetex a second time, because the first time there was no .aux file and the bookmarks were not created:
$PDFJADETEX $1.tex |
A third pass of pdflatex is needed, in order to get the page numbers in the Table of Contents computed. See the PRINT_PDF_DSL used above for the parameters that control printing and placement of ToC:
$PDFJADETEX $1.tex |
Our PDF document, with all its bells and whistles, is now ready!
The RTF and TXT outputs are easy. For the RTF, we just have to do:
$OPENJADE -t rtf -d $PRINT_RTF_DSL -i "output.print.bmp" $1.sgml |
and for the TXT:
$LYNX -dump -nolist $1.html > $1.txt |
i.e. we use the Lynx text browser with the -dump option to create a text version from the one, big HTML file we created previously.
For the PS output, we have a little more work: we have to set the printer to "cmz", so that dvips (which will be called either through sgmltools, as in older versions of the script, or directly, as in newer ones) will search for the file config.cm (located in /var/lib/texmf/dvips/config/config.cm on my system), which contains the mappings for the "Computer-Modern" fonts. We use "cmz" instead of "cm" in order to embed the font in the PS file, thus making it portable (there is also a file config.cmz):
PRINTER="cmz" export PRINTER $OPENJADE -t tex -d $PRINT_PS_DSL -o $1.tex -i "output.print.eps" $1.sgml # Compress PS $GZIP $1.ps |
As with PDFJADETEX in Section 7.1.4.7, again 3 passes are necessary:
$JADETEX $1.tex $JADETEX $1.tex $JADETEX $1.tex |
An equivalent command would be
$SGMLTOOLS -b ps -s sgmltools-ps -j "-i output.print.eps" $1.sgml |
but you would have to use the right stylesheet (in this invocation, the sgmltools-ps stylesheet is used, which is mapped, through the /etc/sgml/aliases file to "-//SGMLtools//DOCUMENT Docbook Style Sheet for Print//EN"#print.ps, which in turn is mapped, through the sgmltools catalog file /usr/share/sgml/stylesheets/sgmltools/sgmltools.cat, to print.dsl#print.ps, i.e. the print.ps id of the print.dsl file in the same directory where sgmltools.cat is also located - phew...).
Our documents are now all ready. What follows is just general housekeeping: move all documents in the HTML directory and remove the .tpt file. Remove all .pdf, .eps, .gif and .jpg images from ./images under $1 (myTemplate in our example, ./images in the current directory is not affected!). From what has been created, leave only $1.sgml and index.sgml in the current (working) directory:
mv $1.txt $1.rtf $1.pdf $1.ps.gz $1/ mv $1.html $1/ rm $1.tpt $1.log $1.aux $1.out $1.tex rm $1/images/*.pdf $1/images/*.eps $1/images/*.gif $1/images/*.jpg rm $1/images/*/*.pdf $1/images/*/*.eps $1/images/*/*.gif $1/images/*/*.jpg cp $1.sgml $1/ |
You may continue with the processing, calling tar to create various archives, or sed to further tweak the HTML code of the files. I leave these steps as an example for the interested reader. If you don't need this special processing, you should comment it!
| [1] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [2] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [3] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [4] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [5] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [6] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [7] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [8] |
I have inserted some blanks in the code snippet in order to prevent my own scripts (sedscr!) from matching and changing a code that was meant to be an example ;-) |
| [9] |
Yup, we use a sed script to change another sed script... |
| Last updated Mon Sep 24 01:19:25 CEST 2007 | Permalink: http://www.karakas-online.de/mySGML/explain-main-part.html | All contents © 2002-2007 Chris Karakas |