View Single Post
Old 06-02-2009, 12:17 AM   #1
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
GuteBook - the Project Gutenberg eBook Maker/Front-end

GuteBook is a preprocessor for Project Gutenberg ("PG") and PG Australia HTML files (or alternatively the best .txt file available) so as to quickly and easily prepare one or many ebook versions for current ebook readers.

This project was created by Nick Rapallo (nrapallo) and was adapted from the gutlrf.pl code written by FangornUK, 10th Nov 2006 (and as recently modified May 2009).

GuteBook (Windows GUI & Perl script) directly retrieves and converts PG or PG Australia HTML files specified by it's EText-No. or URL. PG Australia ebooks require the URL link to the HTML to also be specified in place of the Input File since there is no direct relationship between the PG Australia EText-No. and its URL. Once the HTML is available, the program fixes/filters many HTML items so as to properly create simultaneously many current ebook formats, including, .epub/.lrf/.mobi/.lit/.imp/.rb versions.

Warning: the GuteBook-gui.exe and GuteBook.exe both access the internet and retrieve PG advanced search/External programs' webpages and individual .zip/.htm/txt files respectively.

To accomplish these ebook creations, GuteBook relies heavily on external programs to facilitate the conversions. It uses calibre's Any2epub/lrf/mobi/lit as well as ETI's eBook Publisher.

Afterwards, picky/advanced users can re-edit/tweak the resulting modified .htm and easily re-create the various ebook formats required via dos batch files.

Now anyone can become a seasoned ebook creator with this easy to use program. So if your results ARE that good, consider contributing them to our EBook Upload forum (in the various ebook formats). Even mobileread.com's "elite" ebook creators may find it useful...

PREREQUISITES:

INSTALLATION:
  • For the Windows GUI, download the GuteBook-0.5-Installer.zip file, unzip it and execute the enclosed .exe. This will install the Windows GUI program and all other support files.
  • Added a "stripped-down" version with no Windows GUI and with no Windows Installer for those that don't want/like Windows and/or bloat. Just download and unzip the GuteBook-noGUI-noInstaller.zip file and run the programs/batch files in the 'bin' directory.
  • Source code (gutebook.pl) and files are now also available at the MobileRead.com Dev Hub.

As always, Enjoy!

EDIT 27-Mar-2011: When using newer versions of calibre, the GuteBook conversion program's rebuild DOS batch file requires you to edit it & prefix any line with "ebook-convert.exe" with "start /w ". Otherwise, the first time "ebook-convert.exe" is invoked is the LAST time it's run...

I've successfully used this revised code within that DOS batch file, namely:
Code:
rem Convert .htm to Sony .epub
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.epub" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg" --chapter "//*[name()='h2']" --output-profile=sony

rem Convert .htm to Sony .lrf
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.lrf" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg"
EDIT: added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) produced in under a minute with its (non-verbose) output results:
Code:
Command Line
============

"C:\Program Files\GuteBook\bin\gutebook" 28700 --1200 --1150 --lrf --epub --keepzip --keephtm 
--pbfirsth1 --imgsrc

GuteBook (version 0.5) Copyright (C) 2009 Nick Rapallo (nrapallo)
Getting "28700" HTML file from Project Gutenberg Website... 
Fetching 571.5KB...
Extracting files...

Book Title : Robin Hood
Author     : Paul Creswick
Illustrator: NC Wyeth
Released   : May 6 2009 EBook 28700
Language   : English

Cleaning "28700" HTML...
Wrote cleaned HTML "C:\Program Files\GuteBook\28700\28700-h\28700-h.htm"
Code:
REVISIONS:
  v0.5 - June 22, 2009
  - For GUI users:  if (blank) file called 'calibreold' (no .ext) exists in install directory,
  then use v0.5 (stable) calibre instead of new v0.6 (beta/release) calibre;
  non-GUI users can use the new switch '--calibreold' in lieu of a file called 'calibreold'! 
  - better allowed installation to different location than default "C:\Program Files".
  - improved direct download of PG Australia ebooks.  Allowed local cached copy to
  be retained using --keepzip or --keephtm; avoids subsequent PGA website downloads.
  - implemented creation of eReader .pdb when using calibre v0.6 (beta)
  - fixed handling of single dash ("-") options
  - improved print statements feedback
  - better handling of PGA metadata within .htm
  - better handling of important/necessary text after "THE END" but before PGA blurb.
  - allowed existing .txt CHARSET to be used for generated .htm meta content-type
  - better handling of --pbnofirst when <h1> already used as a pb tag
  - misc. PGA .htm fixes for color and removed fixed fontsize for <p> and <table>

  v0.4 - June 10, 2009
  - added ability to directly download PG Australia ebooks using their  EText-No. AND URL link
  to the .html placed as the Input file.
    For example, use:  --PGnum 1547A & http://gutenberg.net.au/ebooks07/0700941h.html
    Note that downloading .zip is fine, but .txt is not yet fully functional
  - improved Custom Perl Search and Replace functionalilty.  Still need to use "\ for any " 
  however due to dos limitation can't use ^ yet!
  - minor code/html fixes.

  v0.3 - June 4, 2009
  - add "start" anchor when the PG preamble is retained
  - remove any stray <br>'s from metadata.
  - fix GUI options loading; now properly remembers the 'search' and 'replace' strings.  
  The user must ensure that any " or / are escaped by \ within those strings.
  - simple PG title page added when --cover (GUI: 'Extract cover') specified (would be better to
  take a snapshot of this as a "cover" image)
  - option '--imgsrc' (GUI: 'Keep <img> src only') now removes "width" elements from 
  within preceding <div class=figcenter> which caused images not to be centered in .epub's

  v0.2 - June 3, 2009
  - removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>'
  - minor GUI / files cleanup
  
  v0.1 - June 2, 2009
  - initial public release
Previous downloads: v0.3 .pl (103); GUI (40); noGUI (24)
Attached Thumbnails
Click image for larger version

Name:	GuteBook-screenshot-main.jpg
Views:	1847
Size:	44.5 KB
ID:	30188   Click image for larger version

Name:	GuteBook-screenshot-options.jpg
Views:	1595
Size:	49.5 KB
ID:	30189   Click image for larger version

Name:	GuteBook-PG Australia-main.jpg
Views:	1259
Size:	51.8 KB
ID:	30577  
Attached Files
File Type: epub Paul Creswick - Robin Hood.epub (621.3 KB, 1462 views)
File Type: lrf Paul Creswick - Robin Hood.lrf (707.5 KB, 1014 views)
File Type: imp Paul Creswick - Robin Hood.imp (813.8 KB, 1054 views)
File Type: imp Paul Creswick - Robin Hood_1200.imp (773.1 KB, 1006 views)
File Type: zip GuteBook-0.5-Installer.zip (2.43 MB, 1329 views)
File Type: zip GuteBook-noGUI-noInstaller.zip (4.31 MB, 9630 views)
File Type: pl gutebook.pl (52.6 KB, 1190 views)
File Type: txt gutebook-command-line-help.txt (4.1 KB, 836 views)

Last edited by nrapallo; 09-26-2011 at 05:59 PM. Reason: added some samples of EText-No. 28700 (Robin Hood by Paul Creswick)
nrapallo is offline   Reply With Quote