So here you are, after 200 pages of information bombardement, your eyes wide open in expectation of the last firework: “will it work with my language?”, you ask. This chapter deals with the answer to this question.
FIXME: Add: What is internationalization, what localization?
Here is “the problem at hand”, as described vividly in Messing about with Unicode, XML, XSL, DSSSL, Tex, Omega, Fop and the rest of the mess:
Let's say you have a source of data that is going to be published. The data comprises text from many, many languages. English. Dutch. Chinese, sure. Malayalam, perhaps. Tibetan, of course. And it contains some pretty weird symbols. Like a shwa -- a topsy-turvy e: . In order to subsume all this data in one character encoding, everything is encoded in Unicode: the current standard for multi-lingual, unified text encoding.
You would like to print your text, publish your text on the web, and perhaps also to prepare it for further editing in a word-processing package. And you don't want to lose your Chinese characters, your IPA signs, or your mathematical symbols.
![]() |
Work in progress! |
|---|---|
|
This chapter is work in progress, so the information in it is incomplete - and may even be inaccurate! Many problems may have disappeared, as distributions made the transition to UTF-8. Others may have surfaced. If you have any hints regarding this complex subject, please contact me, or post in my Linux Forum. |
One thing that makes this endeavour so difficult, is that it is comprised of many, many steps, each one with its own inputs, output and tools. To understand the process, we have to dissect it in those steps:
You type something on the console with your keyboard. Each key on your keyboard carries a symbol on it, but this does not mean that when you press it you will see that symbol on the screen. What you see on the screen depends on the keyboard mapping and your console fonts, see Section 11.1. In a GUI environment (and an X terminal, instead of a text console), the GUI may apply its own mappings and LyX must know about them too. See Section 11.5, Section 11.6.2.
Finally, some hexadecimal code arrives at LyX as a result of your key pressing. Will LyX display what you expect? This depends on LyX' language configuration, see Section 11.6.1.
Suppose you managed to type something and see it displayed correctly in LyX in the language of your choice. You save the document and run lyxtox as in Section 5.21. This will kickstart an avalanche of commands, which will ultimately do the following:
Export the document from LyX to DocBook SGML. Depending on its language configuration, LyX will
use the right encoding for the symbols, letters etc you typed and
set the language attribute (“lang”) in the
article
or
book
element of the exported file. For language set to english, this will produce the lines
<article lang="en"><!-- DocBook file was created by LyX 1.2 See http://www.lyx.org/ for more information --> |
near the top of the exported document. This information is used by the DSSSL stylesheets, see Section 11.9.
sed and awk will correct the exported SGML document. Will sed and awk understand your Dutch and Chinese? Or your Malayalam? See Section 11.2 and Section 11.3.
Openjade will transform the corrected SGML document to various formats, including HTML, TeX and RTF. Will Openjade understand the encoding of the input file? How will Openjade represent it internally? And which encoding will Openjade use for the output in HTML, TeX and RTF? See Section 11.7.
Lynx will read the HTML (one file) version and produce the TXT out of it. Will lynx produce a correct TXT document from the (hopefully correct) HTML? See Section 11.11.
Perl scripts transform the TeX version. Will Perl work as expected in the languages of our example? See Section 11.4.
Pdfjadetex transforms the TeX source into a PDF document. Will pdfjadetex understand that its input TeX source is in some other encoding? How can we tell it? Jadetex transforms the TeX source (actually a different TeX source from the above, produced by openjade using the lyxtox-print-ps.dsl DSSSL stylesheet) to a DVI document. Will jadetex understand that its input TeX source is in some other encoding? How can we tell it? Pdfjadetex and jadetex are TeX macro packages and as such they use the underlying TeX installation. Will TeX cope with all those languages? See Section 11.10.
dvips reads the DVI document produced by jadetex and outputs a PS file. What must dvips know about localization to work with our multilanguage document? See Section 11.8.
Will openjade produce the right HTML, TeX and RTF? Will pdfjadetex produce a PDF file in the right encoding? With the right fonts embedded in the document? Will jadetex produce a DVI document with the right encoding and the right fonts? Will dvips produce a correct PS document from the localized DVI? Will lynx produce the right TXT from the localized HTML? What are the open problems related to localization? See Section 11.12.
Questions over questions...Now let's put the puzzle pieces together!
![]() |
LyX User Guide |
|---|---|
|
The following sections are taken from the LyX User Guide:
|
| Last updated Mon Sep 24 01:19:25 CEST 2007 | Permalink: http://www.karakas-online.de/mySGML/localization.html | All contents © 2002-2007 Chris Karakas |