SGML math code as exported by LyX is, once again, not perfect and the first step consists in correcting it, just as we did in Section 7.1.4.1. We will change the SGML math code to fit our needs using again runsed and sedscr. But this time, we are going to need an awk script too, awkscr_math. We'll see in Section 10.3.1.2 why.
Regarding Mathematics code, there are four distinct problems in LyX' SGML:
LyX produces only an <alt>/</alt> tag pair (with the TeX code in it) and a <math>/</math> tag pair with the MathML representation of the equation. This is not enough. The <graphic fileref=”equationimagefile”> is missing. Openjade will complain that the end tag for <equation> was reached, but the element was “not complete”. And of course it is right, since the <alt> tag was meant as a textual representation of the image (see Section 10.4), while the <graphic> tag is expected to hold the visual representation of it in the form of some image file. Clearly, we will have to create the <graphic fileref=...> tags from the scratch. That's the first problem.
The second problem is that LyX exports <equation> even for inline equations! The right tag would be <inlineequation> for inline equations (see inlineequation), <equation> for an equation with title (see equation) and <informalequation> for one without (see informalequation). This is very unfortunate, because it makes all equations “displayed”, i.e. drawn on a separate line. That's the second problem.
The third problem is that the MathML code it produces cannot be dealt with by openjade (at least not with the standard modules I have on my system) and thus produces parse errors. Furthermore, we are not going to need it, since the TeX code inside the <alt> tags will suffice completely for us. We will thus have to delete everything between the <math>/</math> tags. That's the third problem.
The fourth problem regards inequalities in Math Mode. Just writing
will produce and error
E: element "B" undefined |
because the parser will see the brackets around
and think that it is an SGML element. That's the fourth problem.
We will start with the second problem from Section 10.3.1.1 above, since it is the only one that needs both sed and awk to be solved. It is also a nice example of a problem that cannot be solved with only one of them[1]. In order to solve it, an observation was crucial: whenever a displayed equation occurs, LyX ends the preceding line with </para><para>. The idea is therefore to the <equation> tag that follows a line ending in </para><para> to something different than <equation>, so that we can safely say that the rest of remaining <equation> tags denotes actually inline equations and change them accordingly.
The following code in sedscr checks if the line ends in </para><para> and if so, gets the next line in the pattern space (N command). It then changes the <equation> to <informalequation>[2], prints the line and deletes the pattern space completely:
/<\/para><para>$/{
N
s/[ \t\n\r]*<equation>/\
<informalequation>\
/
p
d
}
|
The rest is accomplished in the awk script awkscr_math. We know by now that whatever <equation> tags may have remained, they denote the start of inline equations. We thus change <equation> to <inlineequation> and </equation> in </inlineequation>, but only between <equation>/</equation> tags. This is done by the following code in awkscr_math:
/<equation>/,/<\/equation>/{
gsub("<equation>","<inlineequation>")
gsub("<\/equation>","<\/inlineequation>")
}
|
Finally, we can transform whatever <informalequation> tags remain back to <equation>:
/<informalequation>/,/<\/equation>/{
gsub("<informalequation>","<equation id=\"eq" ++num_eq "\"> <title>(eq" num_eq ")<\/title>")
}
|
By the way, we also did something that is impossible to do in sed: we added a dynamic id and title, both composed of the string “eq” and a dynamically increased counter. The id is the same as the “label” in LyX, the title is what will be displayed in the equation title. By letting the title be equal to the id, we are able to see what id an equation has in the SGML code (because it will be displayed in the title) and set the LyX label for that equation to be the same (see Section 10.2), thus making cross-refernces to equations possible[3].
The first problem, create the <graphic fileref=...> tags, requires a decision: which filename to take? A first idea, to use the label from the TeX codebetween the <alt> tags, as long as there is one, is not viable: what if the TeX code describes more than one equations (an eqnarray), each one labeled with its own label? I decided to use random filenames in all situations, rather than running into such problems. Due to the random part, such a substitution calls for awk, rather than sed.
The random numbers, which will be the filenames to use, are currently drawn between 10000 and 20000. If you want to change these limits, you can do it in the BEGIN part of awkscr_math:
BEGIN {
num_min = 10000
num_max = 20000
num_ran = 0
num_dif = num_max - num_min
num_eq = 0
srand()
}
|
But sometimes, this randomness is not needed:
![]() |
How to get identical filenames for the equations from run to run |
|---|---|
|
If you want the same sequence of random numbers each time you run the script (thus producing the same filenames from run to run), you should comment the seed function srand(). |
The following code creates the <graphic fileref=...> tag with a random filename in the fileref attribute after the closing </alt> tag:
gsub("<\/alt>","<\/alt>\n
<\!\[ \%output\.print\.png; \[\n
<graphic fileref=\"images\/math\/" num_ran "\.png\">\n
\]\]>\n
<\!\[ \%output\.print\.pdf; \[\n
<graphic fileref=\"images\/math\/" num_ran "\.png\">\n
\]\]>\n
<\!\[ \%output\.print\.eps; \[\n
<graphic fileref=\"images\/math\/" num_ran "\.png\">\n
\]\]>\n
<\!\[ \%output\.print\.bmp; \[\n
<graphic fileref=\"images\/math\/" num_ran "\.bmp\">\n
\]\]>\n
")
|
More precisely, it substitutes </alt> with something like
</alt> <![ %output.print.png; [ <graphic fileref="images/math/10404.png"> ]]> <![ %output.print.pdf; [ <graphic fileref="images/math/10404.png"> ]]> <![ %output.print.eps; [ <graphic fileref="images/math/10404.png"> ]]> <![ %output.print.bmp; [ <graphic fileref="images/math/10404.bmp"> ]]> |
Some remarks:
The number used for the filename (10404) is randomly generated (num_ran, in the previous code example).
The directory for the images of the equations is images/math. If you need to change it, you will have to do so everywhere in awkscr_math.
We make again use of the output.print.xxx entities to denote code that has to be IGNOREd or INCLUDEd , depending on the format we are rendering, see Section 7.2.2 and Section 7.1.4.1.
We specify a PNG file even for PDF and PS processing. That's not important. These formats will not take into account the <graphic> element when processed (the stylesheets will take care of this). Nevertheless, we must put a <graphic> tag with some filename there, otherwise openjade will complain.
The important information is that the file 10404.png shall be used to display the equation when the output.print.png entity is included (i.e. only in HTML) and that 10404.bmp shall be used when the output.print.bmp entity is included (i.e. only in RTF).
The above substitution takes place in the following 4 situations (which cover all math situations in LyX' SGML) in awkscr_math:
Between “<alt>\[” and “</alt>”.
Between “<alt>$” and “</alt>”.
Between “<alt>\begin{equation}” and “</alt>”.
Between “<alt>\begin{eqnarray}” and “</alt>”.
This solves our first problem.
While we are at it, we substitute the “<” and “>” symbols (that appear in inequalities between the alt tags) with their SGML entities[4]:
if ( $1 != "<alt>\\\[" && $1 != "<\/alt>" ) {
gsub("<"," \\< ")
gsub(">"," \\> ")
}
|
When texmath2pngbmp.pl is executed, it will see those entities and will substitute them with their numeric equivalents:
sub unescape {
$eqn =~ s/&/&/g;
$eqn =~ s/>/\>/g;
$eqn =~ s/</\</g;
}
|
This solves the fourth problem.
The third problem is easier solved: the following code in awkscr_math will substitute everything between <math> and </math> with the empty string (thus creating an empty line):
/<math>/,/<\/math>/{
gsub(".*","")
}
|
That empty lines do not get printed, is easily seen from the last code block in awkscr_math:
!/^$/{
print
}
|
which will print every line that is not empty.
| [1] |
at least not easily: we would probably need two consecutive invocations of sed, or employ some complicated branching. Contrary to my usual predilection, I chose to make it as simple as possible, rather than complex and wonderful. ;-) The next LyX release may render it obsolete anyway. |
| [2] |
It doesn't matter to what you change it to, as long as it is different from <equation>. |
| [3] |
you can only cross-reference an equation only if you previously set a label to it, but the LyX label cannot (and will not) be exported to SGML, since it refers to a line in a possibly multi-line equation - what id should then be exported to SGML form an equation with three lines, all carrying a label in LyX? |
| [4] |
The IF statement in the code avoids the substitution for the brackets that surround the alt tags themselves. |
| Last updated Mon Sep 24 01:19:25 CEST 2007 | Permalink: http://www.karakas-online.de/mySGML/explain-sgml-math-code-correction.html | All contents © 2002-2007 Chris Karakas |