View Single Post
Old 09-20-2007, 03:05 PM   #1
phrodod began at the beginning.
phrodod's Avatar
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
GutenMark to convert PG books to HTML

For those converting Project Gutenberg (PG) e-books for use in Book Designer (BD) or other HTML-based conversion software (read, HTML2LRF), check out GutenMark ( This program takes PG plain text files, and automatically converts them to HTML. It isn't perfect, and it's a command-line program, but it would sure save someone like HarryT hours and hours of changing _words_ to words.

It's independent from PG, but PG does link to it from their site.

I've only tried it quickly, but in my quick tests, it appears to handle:
  • Changing a wide variety of italics substitutes to real italics
  • Changing regular quotes to curly quotes
  • Changing double-dashes (--) to em dashes
  • Highlighting chapter titles as headers
  • Converting uppercase titles to mixed case

The final output looks pretty good, and would sure save hours of reformatting in BD. Instead, you'd start pretty far along the process, and just use BD for final touches (TOC, Title Page, etc).

Has anyone used it already? Were the results good, bad, or ugly? Any hints or suggestions on the "best" way to run it?

Thanks, and enjoy!
phrodod is offline   Reply With Quote