GuteBook is a preprocessor for Project Gutenberg ("PG") and PG Australia HTML files (or alternatively the best .txt file available) so as to quickly and easily prepare one or many ebook versions for current ebook readers.
This project was created by Nick Rapallo (nrapallo) and was adapted from the gutlrf.pl
code written by FangornUK, 10th Nov 2006 (and as recently modified May 2009).
GuteBook (Windows GUI & Perl script) directly retrieves and converts PG or PG Australia HTML files specified by it's EText-No. or URL
. PG Australia ebooks require the URL link to the HTML to also be specified in place of the Input File since there is no direct relationship between the PG Australia EText-No. and its URL. Once the HTML is available, the program fixes/filters many HTML items so as to properly create simultaneously
many current ebook formats, including, .epub/.lrf/.mobi/.lit/.imp/.rb versions.
Warning: the GuteBook-gui.exe and GuteBook.exe both access the internet and retrieve PG advanced search/External programs' webpages and individual .zip/.htm/txt files respectively.
To accomplish these ebook creations, GuteBook relies heavily on external programs to facilitate the conversions. It uses calibre's Any2epub/lrf/mobi/lit as well as ETI's eBook Publisher.
Afterwards, picky/advanced users can re-edit/tweak the resulting modified .htm and easily re-create the various ebook formats required via dos batch files.
Now anyone can become a seasoned ebook creator with this easy to use program. So if your results ARE that good, consider contributing them to our EBook Upload forum (in the various ebook formats). Even mobileread.com's "elite" ebook creators may find it useful...
Source code (gutebook.pl) and files are now also available at the MobileRead.com Dev Hub.
- For the Windows GUI, download the GuteBook-0.5-Installer.zip file, unzip it and execute the enclosed .exe. This will install the Windows GUI program and all other support files.
- Added a "stripped-down" version with no Windows GUI and with no Windows Installer for those that don't want/like Windows and/or bloat. Just download and unzip the GuteBook-noGUI-noInstaller.zip file and run the programs/batch files in the 'bin' directory.
As always, Enjoy!
: When using newer versions of calibre, the GuteBook conversion program's rebuild DOS batch file requires you to edit it
& prefix any line with "ebook-convert.exe" with "start /w
". Otherwise, the first time "ebook-convert.exe" is invoked is the LAST time it's run...
I've successfully used this revised code within that DOS batch file, namely:
rem Convert .htm to Sony .epub
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.epub" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg" --chapter "//*[name()='h2']" --output-profile=sony
rem Convert .htm to Sony .lrf
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.lrf" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg"
added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) produced in under a minute
with its (non-verbose) output results:
"C:\Program Files\GuteBook\bin\gutebook" 28700 --1200 --1150 --lrf --epub --keepzip --keephtm
GuteBook (version 0.5) Copyright (C) 2009 Nick Rapallo (nrapallo)
Getting "28700" HTML file from Project Gutenberg Website...
Book Title : Robin Hood
Author : Paul Creswick
Illustrator: NC Wyeth
Released : May 6 2009 EBook 28700
Language : English
Cleaning "28700" HTML...
Wrote cleaned HTML "C:\Program Files\GuteBook\28700\28700-h\28700-h.htm"
v0.5 - June 22, 2009
- For GUI users: if (blank) file called 'calibreold' (no .ext) exists in install directory,
then use v0.5 (stable) calibre instead of new v0.6 (beta/release) calibre;
non-GUI users can use the new switch '--calibreold' in lieu of a file called 'calibreold'!
- better allowed installation to different location than default "C:\Program Files".
- improved direct download of PG Australia ebooks. Allowed local cached copy to
be retained using --keepzip or --keephtm; avoids subsequent PGA website downloads.
- implemented creation of eReader .pdb when using calibre v0.6 (beta)
- fixed handling of single dash ("-") options
- improved print statements feedback
- better handling of PGA metadata within .htm
- better handling of important/necessary text after "THE END" but before PGA blurb.
- allowed existing .txt CHARSET to be used for generated .htm meta content-type
- better handling of --pbnofirst when <h1> already used as a pb tag
- misc. PGA .htm fixes for color and removed fixed fontsize for <p> and <table>
v0.4 - June 10, 2009
- added ability to directly download PG Australia ebooks using their EText-No. AND URL link
to the .html placed as the Input file.
For example, use: --PGnum 1547A & http://gutenberg.net.au/ebooks07/0700941h.html
Note that downloading .zip is fine, but .txt is not yet fully functional
- improved Custom Perl Search and Replace functionalilty. Still need to use "\ for any "
however due to dos limitation can't use ^ yet!
- minor code/html fixes.
v0.3 - June 4, 2009
- add "start" anchor when the PG preamble is retained
- remove any stray <br>'s from metadata.
- fix GUI options loading; now properly remembers the 'search' and 'replace' strings.
The user must ensure that any " or / are escaped by \ within those strings.
- simple PG title page added when --cover (GUI: 'Extract cover') specified (would be better to
take a snapshot of this as a "cover" image)
- option '--imgsrc' (GUI: 'Keep <img> src only') now removes "width" elements from
within preceding <div class=figcenter> which caused images not to be centered in .epub's
v0.2 - June 3, 2009
- removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>'
- minor GUI / files cleanup
v0.1 - June 2, 2009
- initial public release
Previous downloads: v0.3 .pl (103); GUI (40); noGUI (24)