For those converting Project Gutenberg (PG) e-books for use in Book Designer (BD) or other HTML-based conversion software (read, HTML2LRF), check out GutenMark (http://www.sandroid.org/GutenMark/
). This program takes PG plain text files, and automatically converts them to HTML. It isn't perfect, and it's a command-line program, but it would sure save someone like HarryT hours and hours of changing _words_ to words
It's independent from PG, but PG does link to it from their site.
I've only tried it quickly, but in my quick tests, it appears to handle:
- Changing a wide variety of italics substitutes to real italics
- Changing regular quotes to curly quotes
- Changing double-dashes (--) to em dashes
- Highlighting chapter titles as headers
- Converting uppercase titles to mixed case
The final output looks pretty good, and would sure save hours of reformatting in BD. Instead, you'd start pretty far along the process, and just use BD for final touches (TOC, Title Page, etc).
Has anyone used it already? Were the results good, bad, or ugly? Any hints or suggestions on the "best" way to run it?
Thanks, and enjoy!