Stop data retention! Click here & act! Are you a webmaster and want to participate? Here you can find all necessary material for your website - Willst du auch an der Aktion teilnehmen? Hier findest du alle relevanten Infos und Materialien:
Chris Karakas Online Forum Index Karakas Online
 FAQFAQ   Forum SearchForum Search   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
How to convert CHM to PDF with chm2pdf in Linux


Goto page 1, 2  Next
 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.
   Chris Karakas Online Forum Index -> Chm2pdf Forum RSS Feed of this Forum
Share this page: These icons link to social bookmarking sites where readers can share and discover new web pages.Digg  del.icio.us  tc.eserver.org  Blinklist  Furl  Reddit  Blogmarks  Magnolia  Sphere  Yahoo!  Google  Windows Live  Technorati  Blue Dot  Simpy  Newsvine  Stumble Upon  co.mments.com  Blinkbits  BlogMemes  Connotea  View previous topic :: View next topic  
Author Message
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 7:17 pm    Post subject: How to convert CHM to PDF with chm2pdf in Linux
Reply with quote

DOWNLOAD THE SCRIPT HERE: chm2pdf-0.9.tar.gz

I have long been in search for a utility in Linux that converts CHM (Windows HTML Help) files to PDF. I finally found a Python script in

http://code.google.com/p/chm2pdf/

that promised to do the job.

I tried it and it worked suprisingly well. However, it was still not perfect and contained some bugs. As it was released under the GPL, I was able to modify the code - actually, I modified it so much that its original author, Massimo Sandal, will have trouble recognizing it! Laughing

Here is a list of my changes:

  • Added version information:
    Code:

    CHM2PDF v. 0.9
  • Added standard GNU License text:
    Code:

        This program is free software: you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation, either version 2 of the License, or
        (at your option) any later version.
       
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.

        You should have received a copy of the GNU General Public License
        along with this program.  If not, see <http://www.gnu.org/licenses/>.
  • Added my name to copyright notice.
  • Changed TEMP_DIR and TEMP_OUT to CHM2PDF_TEMP_WORK_DIR and CHM2PDF_TEMP_ORIG_DIR. We now have:
    Code:

    CHM2PDF_TEMP_WORK_DIR='/tmp/chm2pdf/work'
    CHM2PDF_TEMP_ORIG_DIR='/tmp/chm2pdf/orig'

    This is the only changeable variables in the script. The user can change them to whatever he likes - no other changes are needed.
  • From CHM2PDF_TEMP_WORK_DIR and CHM2PDF_TEMP_ORIG_DIR, the chm2pdf computes its orig and work dirs by adding the basename of the CHM file (without the .chm ending). So if you give it /home/chris/my-file.chm to convert, it will use /tmp/chm2pdf/work/my-file as work directory and /tmp/chm2pdf/orig/my-file as orig directory. It expands the files contained in the CHM file into the orig directory, then copies the ones it needs into its work dir and continues there. The original TEMP_DIR and TEMP_OUT simply did not work out of the box.
  • Added check for "SRC" in
    Code:

    if key=='src' or key=='SRC':

    line of ImageCatcher(). Some HTML files use tags in capital letters.
  • Added a function correct_file() and refactored corrections to it. Now not only we correct the image URLs, but also delete unwanted elements with:
    Code:

        # Delete unwanted HTML elements.
        page=re.sub('<div .*teamlib\.gif.*\/div>','',page)
        page=re.sub('<a href.*next\.gif[^>]*><\/a>','',page)
        page=re.sub('<a href.*previous\.gif[^>]*><\/a>','',page)
        page=re.sub('<a href.*prev\.gif[^>]*><\/a>','',page)
        page=re.sub('"[^"]*previous\.gif"','""',page)
        page=re.sub('"[^"]*prev\.gif"','""',page)
        page=re.sub('"[^"]*next\.gif"','""',page)

    These will delete the usual navigation icons - things that we really don't need in a PDF! A user versed in regular expressions will be able to add his own rules here. In a later version, this should be refactored to an external rules file.
  • In function convert_to_pdf(), I changed the signature, adding the options hash (more on the options later). I added code that creates the orig and work dirs:
    Code:

        try:
            os.mkdir(CHM2PDF_TEMP_WORK_DIR)
        except OSError: # The directory already exists.
            pass

        try:
            os.mkdir(CHM2PDF_TEMP_ORIG_DIR)
        except OSError: # The directory already exists.
            pass

        try:
            os.mkdir(CHM2PDF_ORIG_DIR)
        except OSError: # The directory already exists.
            pass

        try:
            os.mkdir(CHM2PDF_WORK_DIR)
        except OSError: # The directory already exists.
            pass


    as well as code that processes a so-called "titlefile". The idea here is that the user can give a simple name, like "toc.html", of a file that he knows that contains the table of contents. This file may not be part of the CHM description itself, so it may be missed by the parsing routines in PageLister(). This file, if found, is also corrected like the other ones. It is passed as the value of the --titlefile option to htmldoc later. All this, of course, transparently to the user.
  • I added a lot of sanity checks: filenames as returned by the parsing routines do not always exist (mostly due to an extra anchor added) etc.
  • I correct *all* links in the HTML files in the work directory, not only the image URLs. Here is how:
    Code:

                # Escape slashes in url.
                url_filename_escaped = re.sub('/', '\/', os.path.basename(url))
                # Escape dots in url.
                url_filename_escaped = re.sub('\.', '\.', url_filename_escaped)
                # Escape slashes in htmlout_filename.
                htmlout_filename_escaped = re.sub('/', '\/', os.path.basename(htmlout_filename))
                # Compute a "garbled" htmlout_filename, where dots are simply replaced with underscores.
                htmlout_filename_escaped_garbled = re.sub('\.', '_', htmlout_filename_escaped)

                # Build a list for each of the three strings (the original URL, the output filename and the garbled one).
                # The idea is that we want to replace the match_strings with the corresponding replace_garbled_strings first.
                # Then, in a second pass, we will replace the garbled strings with the "real" replace_strings.
                # This trick is necessary to avoid problems in cases where the original URLs look like
                #
                # 0001.html, 0002.html, 0003.html...
                #
                # and we want to replace as follows:
                #
                # toc.html  -> temp0001.html
                # 0001.html -> temp0002.html
                # 0002.html -> temp0003.html
                # 0003.html -> temp0004.html
                #
                # If we try it "directly", i.e. without the "garbled" names first, we will end up changing:
                #
                # tol.html  -> temp0001.html -> temptemp0002.html -> temptemptemp0003.html ...
                # 0001.html -> temp0002.html -> temptemp0003.html -> temptemptemp0004.html ...
                # ...
                #
                # which is not what we want.
                match_strings.append(url_filename_escaped)
                replace_strings.append(htmlout_filename_escaped)
                replace_garbled_strings.append(htmlout_filename_escaped_garbled)


    After building the match and replace arrays as above, I loop through all HTML files in work dir and do:

    Code:

            # Substitutions in 1st pass: we replace the original filenames with their corresponding "garbled" equivalents.
            for match_string in  match_strings:
                replace_string = replace_garbled_strings[match_strings.index(match_string)]
                page = re.sub(match_string, replace_string, page)


            # Substitutuions in the 2nd pass: we replace the garbled filenames with the correct ones.
            for match_string in  replace_garbled_strings:
                replace_string = replace_strings[replace_garbled_strings.index(match_string)]
                page = re.sub(match_string, replace_string, page)

            # Replace links of the form "somefile.html#894" with "somefile0206.html"       
            # The following will match anchors like '<a href="temp0206.html#894"' and will store the 'temp0206.html' in backreference 1.
            # The replace string will then replace it with '<a href="temp0206.html"', i.e. it will take away the '#894' part.
            # This is because the numbers after the '#' are often wrong or non-existent. It is better to link to an existing
            # chapter than to a non-existent part of an existing chapter.
            page = re.sub('<a href="([^#]*)#[^"]*"', '<a href="\\1"', page)


    The code is very well commented, so it says it all! Mr. Green
  • The rest is a huge piece of code that deals with the 100 or so new options I added to the program. Cool - this should be the subject of another post, stay tuned! Smile

_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 7:27 pm    Post subject:
Reply with quote

Before I present the 100 new options I added to chm2pdf, let me talk about what I removed first: Mr. Green

The chm2pdf script calls htmldoc internally to do the heavy lifting of transforming all those corrected HTML files in the work directory, in the right sequence (which is very important). However, the way this was done required pdftk, which a) is an extra dependency and b) does not help if we want to build the so-called bookmarks in the PDF - those nice tree-like links in a left pane of the reader.

The reason was that all those HTML files were converted to PDF first, then assembled with pdftk into one big PDF file. You cannot get bookmarks this way.

The solution was to pass the HTML file names to htmldoc directly. This way we don't need pdftk anymore - plus we get bookmarks if we use the --book option! Very Happy
_________________
Regards

Chris Karakas
www.karakas-online.de


Last edited by chris on Wed Nov 14, 2007 10:38 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 7:55 pm    Post subject:
Reply with quote

Now, did I say option?...

This was the next problem. You had to open the script in a decent text editor, then add whatever option you needed manually to the call to htmldoc.

No more. I decided to add *all* (OK, almost all Wink) htmldoc options to chm2pdf. The help function shows them all:

Code:

chm2pdf --help

Usage:
        /usr/bin/chm2pdf [options] input_filename [output_filename]

Options:

        --bodycolor color
                Specifies the background color for all pages.
        --bodyfont {courier,helvetica,monospace,sans,serif,times}
        --bodyimage filename.{bmp,gif,jpg,png}
        --book
                Specifies that the HTML sources are structured (headings, chapters, etc.).
        --bottom margin{in,cm,mm}
                Specifies the bottom margin in points (no suffix or ##pt), inches  (##in),  centimeters  (##cm),  or millimeters (##mm).
        --browserwidth pixels
                See http://www.htmldoc.org/newsgroups.php?ghtmldoc.general+v:3465
        --charset {cp-874...1258,iso-8859-1...8859-15,koi8-r}
                Specifies the ISO character set to use for the output.
        --color
                Specifies that PDF output should be in color.
        --compression[=level]

        --continuous
                Specifies  that  the  HTML  sources are unstructured (plain web pages).
                No page breaks are inserted between each file or URL in the output.
        --cookies 'name="value with space"; name=value'

        --datadir directory
                Specifies the  location  of  the  HTMLDOC  data  files,  usually  /usr/share/htmldoc  or  C:\Program Files\HTMLDOC
        --duplex
                Specifies that the output should be formatted for double-sided printing.
        --effectduration {0.1..10.0}
                Specifies the duration in seconds of PDF page transition effects.
        --embedfonts
                Specifies that fonts should be embedded in PDF output.
        --encryption
                Enables encryption of PDF files.
        --extract-only
                Extract the HTML files from the CHM file and stop.
                The extracted files will be found in CHM2PDF_WORK_DIR/input_filename_without_extension.
        --firstpage {p1,toc,c1}

        --fontsize {4.0..24.0}
                Specifies the default font size for body text.
        --fontspacing {1.0..3.0}
                Specifies  the  default  line  spacing  for body text.
                The line spacing is a multiplier for the font size, so a value of 1.2
                will provide an additional 20% of space between the lines.
        --footer fff

        {--format, -t} {pdf11,pdf12,pdf13,pdf14}
                Specifies the output format: pdf11
                pdf11 (PDF 1.1/Acrobat 2.0), pdf12 (PDF 1.2/Acrobat 3.0),
                pdf or pdf13 (PDF  1.3/Acrobat  4.0),  or  pdf14 (PDF 1.4/Acrobat 5.0)
        --gray

        --header fff

        --header1 fff

        --headfootfont {courier{-bold,-oblique,-boldoblique},
                helvetica{-bold,-oblique,-boldoblique},
                monospace{-bold,-oblique,-boldoblique},
                sans{-bold,-oblique,-boldoblique},
                serif{-bold,-italic,-bolditalic},
                times{-roman,-bold,-italic,-bolditalic}}
                        Sets the font to use on headers and footers.
        --headfootsize {6.0..24.0}
                Sets the size of the font to use on headers and footers.
        --headingfont {courier,helvetica,monospace,sans,serif,times}
                Sets the typeface to use for headings.
        --help
                Displays a summary of command-line options.
        --hfimage0 filename.{bmp,gif,jpg,png}
                 
        --hfimage1 filename.{bmp,gif,jpg,png}
                 
        --hfimage2 filename.{bmp,gif,jpg,png}
                 
        --hfimage3 filename.{bmp,gif,jpg,png}
                 
        --hfimage4 filename.{bmp,gif,jpg,png}
                 
        --hfimage5 filename.{bmp,gif,jpg,png}
                 
        --hfimage6 filename.{bmp,gif,jpg,png}
                 
        --hfimage7 filename.{bmp,gif,jpg,png}
                 
        --hfimage8 filename.{bmp,gif,jpg,png}
                 
        --hfimage9 filename.{bmp,gif,jpg,png}
                 
        --jpeg quality
                Sets the JPEG compression level to use for large images. A value of 0 disables JPEG compression.
        --landscape

        --left margin{in,cm,mm}
                Specifies the left margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or  millimeters (##mm).
        --linkcolor color
                Sets the color of links. You can use well-known color names like blue, or the usual #RRGGBB notation.
        --links
                Enables generation of links in PDF files (default).
        --linkstyle {plain,underline}
                Sets the style of links.
        --logoimage filename.{bmp,gif,jpg,png}
                Specifies an image to be used as a logo in the header or footer in a PDF document.
        --logoimage filename.{bmp,gif,jpg,png}
                Note that you need to use the --header and/or --footer options with the l parameter.
        --no-compression
                Disables compression of PDF file.
        --no-duplex
                Disables double-sided printing.
        --no-embedfonts
                Specifies that fonts should not be embedded in PDF and PostScript output.
        --no-encryption
                Disables document encryption.
        --no-links
                Disables generation of links in a PDF document.
        --no-localfiles

        --no-numbered
                Disables automatic heading numbering.
        --no-overflow

        --no-strict
                Disables strict HTML input checking.
        --no-title
                Disables generation of a title page.
        --no-toc
                Disables generation of a table of contents.
        --numbered
                Numbers all headings in a document.
        --nup {1,2,4,6,9,16}
                Sets  the  number of pages that are placed on each output page.  Valid values are 1, 2, 4, 6, 9, and 16.
        {--outfile, -f} filename{.pdf}
                Specifies the name of the output file. If no ending is given, ".pdf" is used.
        --overflow

        --owner-password password
                Sets the owner password for encrypted PDF files.
        --pageduration {1.0..60.0}
                Sets the view duration of a page in a PDF document.
        --pageeffect {none,bi,bo,d,gd,gdr,gr,hb,hsi,hso,vb,vsi,vso,wd,wl,wr,wu}
                Specifies the page transition effect for all pages; this attribute is ignored by all Adobe PDF viewers..
        --pagelayout {single,one,twoleft,tworight}
                Specifies the initial layout of pages for a PDF file.
        --pagemode {document,outline,fullscreen}
                Specifies the initial viewing mode for a PDF file.
        --path "dir1;dir2;dir3;...;dirN"
                Specifies a search path for files in a document.
        --permissions {all,annotate,copy,modify,print,no-annotate,no-copy,no-modify,no-print,none}
                Specifies document permissions for encrypted PDF files. Separate multiple permissions with commas.
        --portrait

        --quiet
                Suppresses all messages, even error messages.
        --right margin{in,cm,mm}
                Specifies the right margin in points (no suffix or ##pt), inches (##in), centimeters (##cm), or millimeters (##mm).
        --size {letter,a4,WxH{in,cm,mm},etc}
                Specifies the page size using a standard name or in points (no suffix or ##x##pt), inches (##x##in),
                centimeters (##x##cm), or millimeters (##x##mm). The standard sizes that  are  currently  recognized
                are "letter" (8.5x11in), "legal" (8.5x14in), "a4" (210x297mm), and "universal" (8.27x11in).
        --strict
                Enables strict HTML input checking.
        --textcolor color
                Specifies the default color of all text.
        --textfont {courier,helvetica,monospace,sans,serif,times}

        --title
                Enables the generation of a title page.
        --titlefile filename.{htm,html,shtml}
                Specifies  the  file to use for the title page. If the file is an image then the title page
                is automatically generated using the document meta data and image title.
        --titleimage filename.{bmp,gif,jpg,png}
                Specifies  the  image to use for the title page. The title page is automatically
                generated using the document meta data and title image.
        --tocfooter fff
                Sets the page footer to use on table-of-contents pages. See below for the format of fff.
        --tocheader fff
                Sets the page header to use on table-of-contents pages. See below for the format of fff.
        --toclevels levels
                Sets the number of levels in the table-of-contents.
        --toctitle string
                Sets the title for the table-of-contents.
        --top margin{in,cm,mm}
                Specifies the top margin in points (no suffix or ##pt), inches (##in), centimeters (##cm),  or  millimeters (##mm).
        --user-password password
                Specifies the user password for encryption of PDF files.
        --version
                Displays the current version number.
        --webpage
                Specifies  that  the  HTML  sources  are  unstructured  (plain web pages).
                A page break is inserted between each file or URL in the output.

        fff
                Heading format string; each 'f' can be one of:

                        . = blank
                        / = n/N arabic page numbers (1/3, 2/3, 3/3)
                        : = c/C arabic chapter page numbers (1/2, 2/2, 1/4, 2/4, ...)
                        1 = arabic numbers (1, 2, 3, ...)
                        a = lowercase letters
                        A = uppercase letters
                        c = current chapter heading
                        C = current chapter page number (arabic)
                        d = current date
                        D = current date and time
                        h = current heading
                        i = lowercase roman numerals
                        I = uppercase roman numerals
                        l = logo image
                        t = title text
                        T = current time

### Either '--book' or '--webpage' MUST be given!
### Only one of the two options can be present, not both!
### See above or try '/usr/bin/chm2pdf --help | less' to view the help contents in less.


I used the Python getopt module for this - and a lot of tedious typing! Laughing
_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 8:17 pm    Post subject:
Reply with quote

Having said all the above, how does one use chm2pdf?

The script uses some sensible defaults for some of the options:

Code:

--duplex
--embedfonts
--header 'c C'
--footer 'c C'
--format pdf14
--jpeg 100
--linkcolor blue
--linkstyle plain
--size a4


so if you just type:

Code:

chm2pdf --book my-file.chm


or

Code:

chm2pdf --webpage my-file.chm


you will get a PDF file called my-file.pdf which is on A4 paper, with different margins on even and odd pages (like a real book), with blue links that are NOT underlined, with embedded fonts so that everybody can enjoy it with the same fonts you do, with 100% JPEG quality in the images and with meaningful information (chapter title and page number) on headers and footers.

In case you don't agree, you have to pass the option of your choice (the sequence of options does not matter, that's the magic of the getopt module!).

One thing you always have to remember is to pass either the --book or the --webpage option, otherwise you will get an error. This is because htmldoc wants it this way. And it wants it this way because of the inherent difficulty of structuring a bunch of HTML files into an hierarchy of chapters: if the HTML files that are contained in the CHM file make use of <h1> tags and proper nesting of all <h*> tags, then you can use the --book option to get nice bookmarks and a nice PDF book out of your CHM.

If, however, as is often the case, the CHM file contains only HTML files with, say, <h3>, <h4> and <h5> headings, then your files are, as far as htmldoc is concerned, "unstructured" - and you must use the --webpage option!

For these and the other options, you should definitely consult the htmldoc documentation - this is your friend!

A very interesting option (and one that is not htmldoc-specific, but is pure chm2pdf-specific) is the --extract-only option:

Code:

chm2pdf --extract-only my-file.chm


will extract the contents of my-file.chm (all the HTML and other, "special", files inside the CHM) into the directory CHM2PDF_TEMP_ORIG_DIR/my-file. Thus, if CHM2PDF_TEMP_ORIG_DIR has its default value "/tmp/chm2pdf/orig", chm2pdf will extract the file into /tmp/chm2pdf/orig/my-file and stop. You can then examine the extracted files at your pace. They will be overwritten next time you call chm2pdf with the same file(name).

Arrow Currently, the work and orig directories are not purged. This will fill your /tmp directory slowly but definitely up! It is advisable to delete those directories manually from time to time until we come up with a better idea - or another option! Mr. Green
_________________
Regards

Chris Karakas
www.karakas-online.de


Last edited by chris on Wed Nov 14, 2007 9:09 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 8:40 pm    Post subject:
Reply with quote

Here are some examples of chm2pdf usage:

  • Unstructured HTML files inside the CHM file, use of --webpage option - produces my-file.pdf:

    Code:

    chm2pdf --webpage  my-file.chm
  • Structured HTML files:

    Code:

    chm2pdf --book my-file.chm
  • Structured HTML files, produce a table of contents page automatically (--title option):

    Code:

    chm2pdf --book --title my-file.chm
  • Structured HTML files, produce a table of contents page automatically (--title option), add a "titlepage" as found in the toc.html file inside the CHM:

    Code:

    chm2pdf --book --title --titlefile toc.html  my-file.chm


    If the file "toc.html" is not found, a warning will be issued and the option will be ignored - just browse the (very verbose) output of chm2pdf to see that warning if you suspect you mispelled the titlefile name.
  • Like above, now with an explicitly set page size and maximum compression level for the resulting PDF:

    Code:

    chm2pdf --book --title --titlefile toc.html  --size 177.8x233.3mm --compression 9 my-file.chm
  • Like above, but name the PDF "your-file.pdf":

    Code:

    chm2pdf --book --title --titlefile toc.html  --size 177.8x233.3mm --compression 9 my-file.chm your-file.pdf


Some nice values regarding page sizes are:

  • Manning style: --size 187.3x235.3mm
  • O'Reilly style: --size 177.8x233.3mm
  • Wiley style: --size 189.2x235.5mm
  • Pragmatic style: --size 190.5x228.6mm
  • CRC style: --size 152.4x234.9mm
  • Medicine book style: --size 184.1x260.3mm


Arrow Post your nice combinations of options and parameters here! We need more examples of usage - chm2pdf can now do a lot more for you!
_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 8:58 pm    Post subject:
Reply with quote

What else can we say about chm2pdf apart from its extreme flexibility with all those options above?

Well, for one, it is very fast! The only option you had till now if you wanted to convert from CHM to PDF was the Windows CHM Magic program. It works well (although it crashes at some CHMs), but it is veeeery slow: it may take hours for a book in CHM format to be converted to PDF! Compare this to minutes or even seconds for the same file with chm2pdf! Check for yourself, if you don't believe me!

Arrow Kudos to htmldoc for a really fast conversion!

For another, it is scriptable! Cool

That does not mean much to you? Consider this: you have a few dozens CHM files that you would love to read as PDF - how do you go about it with "point-and-click" software?

It takes a month or so - or you don't do it. You have more important things to do in life than clicking on menus, after all...

With chm2pdf you type

Code:

ls *.chm | xargs -n 1 chm2pdf --book [other options here, but no CHM filenames]


or

Code:

ls *.chm | xargs -n 1 chm2pdf --webpage [other options here as above]


go get a cup of coffee - and chm2pdf is done before you finish your cup!

Enjoy! Very Happy
_________________
Regards

Chris Karakas
www.karakas-online.de


Last edited by chris on Wed Nov 14, 2007 11:54 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Wed Nov 14, 2007 9:36 pm    Post subject:
Reply with quote

What are the prerequisites for chm2pdf?

You need:



As said above, pdftk is not needed anymore - but it is a very useful tool and I recommend it anyway! You will love it if you ever need to concatenate 10 PDFs into one! I have compiled an RPM package for pdftk 1.12 for you - on my SUSE 9.x system. It should work on any RPM-based distribution.

Exclamation The chmlib package should be compiled with the --enable-examples to the configure script! This is because chm2pdf makes use of the enum_chmLib utility, which will not be built otherwise. The RPM I offer above fulfills this requirement. Smile

Enjoy! Very Happy
_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Tue Jan 08, 2008 3:54 pm    Post subject:
Reply with quote

A new development snapshot (build 1541) of the 1.9.x version of htmldoc was released yesterday. I offer binary and source RPMs for this new version in How to compile htmldoc in SuSE, RPMs included!. From now on, it is advisable to use this new version of htmldoc in conjunction with chm2pdf. Enjoy! Very Happy
_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
MkIV
Private
Private


Joined: 06 Feb 2008
Posts: 1

PostPosted: Wed Feb 06, 2008 8:26 pm    Post subject:
Reply with quote

chris wrote:
Open quoteA new development snapshot (build 1541) of the 1.9.x version of htmldoc was released yesterday. I offer binary and source RPMs for this new version in How to compile htmldoc in SuSE, RPMs included!. From now on, it is advisable to use this new version of htmldoc in conjunction with chm2pdf. Enjoy! Very HappyClose quote


Is it possilbe to run it under Windows XP? I am not sure about all these libraries.
Back to top
View user's profile Send private message
chris
Dark Lord of the Sith


Joined: 10 May 2003
Posts: 6267
Location: Outer Space

PostPosted: Thu Feb 07, 2008 6:11 pm    Post subject:
Reply with quote

You can install htmldoc in Windows, see the HTML User's Manual for instructions. Since Python is also available in Windows, you could install Python and the other Python libs needed by chm2pdf, then put (...uhmm, sorry: deploy Mr. Green) chm2pdf somewhere conveniently for you, open it in a decent text editor and change

Code:

CHM2PDF_TEMP_WORK_DIR='/tmp/chm2pdf/work'
CHM2PDF_TEMP_ORIG_DIR='/tmp/chm2pdf/orig'


to, something like:

Code:

CHM2PDF_TEMP_WORK_DIR='C:\tmp\chm2pdf\work'
CHM2PDF_TEMP_ORIG_DIR='C:\tmp\chm2pdf\orig'


(or, instead of "C:\tmp", to wherever your temporary directory is, or should be, global or local, no matter), as well as around line 419:

Code:

os.system ('htmldoc' + htmldoc_opts + ' ' + htmlout_filename_list + " -f "+ outputfilename + " > /dev/null")


to something like:

Code:

os.system ('C:\Program Files\HTMLDOC\htmldoc' + htmldoc_opts + ' ' + htmlout_filename_list + " -f "+ outputfilename + " > /dev/null")


(i.e. put the full path to the htmldoc executable there)...

...and you should be done! Very Happy

I am very interested in the results, as I didn't try it, so please come back to report how it went.

Arrow PS. Needless to say you would then have to run chm2pdf from a "DOS" box (correctly said: a "cmd box"). Smile
_________________
Regards

Chris Karakas
www.karakas-online.de
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Share this page: These icons link to social bookmarking sites where readers can share and discover new web pages.Digg  del.icio.us  tc.eserver.org  Blinklist  Furl  Reddit  Blogmarks  Magnolia  Sphere  Yahoo!  Google  Windows Live  Technorati  Blue Dot  Simpy  Newsvine  Stumble Upon  co.mments.com  Blinkbits  BlogMemes  Connotea 
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.
   Chris Karakas Online Forum Index -> Chm2pdf Forum
Page 1 of 2
This page contains valid HTML 4.01 Transitional - click here to check it!
This page contains a valid CSS - click here to check it!

 

Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group