Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : Utility for converting Gutenberg books.


FangornUK
10-20-2006, 07:52 AM
Here's a little perl script I knocked together for converting Gutenberg HTML books. It produces a HTML format suitable for the Librie Toolbar (http://www.sven.de/librie/Librie/AddonSoftware) (which creates BBEB books).

You additionally need wget and unzip installed to get it to work. It should work fine on MacOSX and Linux. For Windows I use Cygwin (cygwin.com).

To use it simply pass the location of the ZIPed HTML file on the Gutenberg page for the Book you're interested in, for example:
guthtml.pl http://www.gutenberg.org/files/17290/17290-h.zip

Then you just open the "new.htm" file in Internet Explorer and use the Librie Toolbar to create BBeBs :D I find the Librie Toolbar at the moment produces the best eBooks as it has the cleanest fonts and these are the fastest on the Reader. It doesn't produce perfect conversions but until a proper tool comes out I use this for now.

To convert text based versions of Gutenberg I simply use GutenMark and convert the produced HTML file with Librie Toolbar.

The script can also call htmldoc if you want to create PDFs for the Reader, simply uncomment the last few lines in the script.

If anyone knows how to get page breaks to work in Librie Toolbar please let me know.

TadW
10-20-2006, 03:32 PM
Good work, Fangorn.

FangornUK
11-02-2006, 09:16 AM
There has been some dicsussion on Gutenberg books not having any markup but for quite a while most new books have HTML versions with the markup from the original books. Many older text versions are being updated with HTML versions. For those old text ones that have no HTML, I really recommend GutenMark as it does a good job of putting back the markup and putting back in none ascii characters such as umlauts.

I've attached an ebook for the Sony Reader converted from a Gutenberg HTML using the perl script above and then put through Librie Toolbar to create a BBeB file (LRF). Nothing was edited in the files to produce this ebook. I haven't managed to figure out how to create Page Breaks with the toolbar though.