![]() |
#1 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
GuteBook - the Project Gutenberg eBook Maker/Front-end
GuteBook is a preprocessor for Project Gutenberg ("PG") and PG Australia HTML files (or alternatively the best .txt file available) so as to quickly and easily prepare one or many ebook versions for current ebook readers.
This project was created by Nick Rapallo (nrapallo) and was adapted from the gutlrf.pl code written by FangornUK, 10th Nov 2006 (and as recently modified May 2009). GuteBook (Windows GUI & Perl script) directly retrieves and converts PG or PG Australia HTML files specified by it's EText-No. or URL. PG Australia ebooks require the URL link to the HTML to also be specified in place of the Input File since there is no direct relationship between the PG Australia EText-No. and its URL. Once the HTML is available, the program fixes/filters many HTML items so as to properly create simultaneously many current ebook formats, including, .epub/.lrf/.mobi/.lit/.imp/.rb versions. Warning: the GuteBook-gui.exe and GuteBook.exe both access the internet and retrieve PG advanced search/External programs' webpages and individual .zip/.htm/txt files respectively. To accomplish these ebook creations, GuteBook relies heavily on external programs to facilitate the conversions. It uses calibre's Any2epub/lrf/mobi/lit as well as ETI's eBook Publisher. Afterwards, picky/advanced users can re-edit/tweak the resulting modified .htm and easily re-create the various ebook formats required via dos batch files. Now anyone can become a seasoned ebook creator with this easy to use program. So if your results ARE that good, consider contributing them to our EBook Upload forum (in the various ebook formats). Even mobileread.com's "elite" ebook creators may find it useful... ![]() PREREQUISITES:
INSTALLATION:
As always, Enjoy! EDIT 27-Mar-2011: When using newer versions of calibre, the GuteBook conversion program's rebuild DOS batch file requires you to edit it & prefix any line with "ebook-convert.exe" with "start /w ". Otherwise, the first time "ebook-convert.exe" is invoked is the LAST time it's run... ![]() ![]() I've successfully used this revised code within that DOS batch file, namely: Code:
rem Convert .htm to Sony .epub start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.epub" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg" --chapter "//*[name()='h2']" --output-profile=sony rem Convert .htm to Sony .lrf start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.lrf" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg" Code:
Command Line ============ "C:\Program Files\GuteBook\bin\gutebook" 28700 --1200 --1150 --lrf --epub --keepzip --keephtm --pbfirsth1 --imgsrc GuteBook (version 0.5) Copyright (C) 2009 Nick Rapallo (nrapallo) Getting "28700" HTML file from Project Gutenberg Website... Fetching 571.5KB... Extracting files... Book Title : Robin Hood Author : Paul Creswick Illustrator: NC Wyeth Released : May 6 2009 EBook 28700 Language : English Cleaning "28700" HTML... Wrote cleaned HTML "C:\Program Files\GuteBook\28700\28700-h\28700-h.htm" Code:
REVISIONS:
v0.5 - June 22, 2009
- For GUI users: if (blank) file called 'calibreold' (no .ext) exists in install directory,
then use v0.5 (stable) calibre instead of new v0.6 (beta/release) calibre;
non-GUI users can use the new switch '--calibreold' in lieu of a file called 'calibreold'!
- better allowed installation to different location than default "C:\Program Files".
- improved direct download of PG Australia ebooks. Allowed local cached copy to
be retained using --keepzip or --keephtm; avoids subsequent PGA website downloads.
- implemented creation of eReader .pdb when using calibre v0.6 (beta)
- fixed handling of single dash ("-") options
- improved print statements feedback
- better handling of PGA metadata within .htm
- better handling of important/necessary text after "THE END" but before PGA blurb.
- allowed existing .txt CHARSET to be used for generated .htm meta content-type
- better handling of --pbnofirst when <h1> already used as a pb tag
- misc. PGA .htm fixes for color and removed fixed fontsize for <p> and <table>
v0.4 - June 10, 2009
- added ability to directly download PG Australia ebooks using their EText-No. AND URL link
to the .html placed as the Input file.
For example, use: --PGnum 1547A & http://gutenberg.net.au/ebooks07/0700941h.html
Note that downloading .zip is fine, but .txt is not yet fully functional
- improved Custom Perl Search and Replace functionalilty. Still need to use "\ for any "
however due to dos limitation can't use ^ yet!
- minor code/html fixes.
v0.3 - June 4, 2009
- add "start" anchor when the PG preamble is retained
- remove any stray <br>'s from metadata.
- fix GUI options loading; now properly remembers the 'search' and 'replace' strings.
The user must ensure that any " or / are escaped by \ within those strings.
- simple PG title page added when --cover (GUI: 'Extract cover') specified (would be better to
take a snapshot of this as a "cover" image)
- option '--imgsrc' (GUI: 'Keep <img> src only') now removes "width" elements from
within preceding <div class=figcenter> which caused images not to be centered in .epub's
v0.2 - June 3, 2009
- removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>'
- minor GUI / files cleanup
v0.1 - June 2, 2009
- initial public release
Last edited by nrapallo; 09-26-2011 at 06:59 PM. Reason: added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) |
![]() |
![]() |
![]() |
#2 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
FAQ
Reserved for FAQ/Tutorial.
1. Do I have to use the Windows GUI? No, that's optional, but it is easier than remembering and manually typing the required switches. The Perl script came first and is perfectly useable on it's own. The 'samples' directory in the GuteBook Install directory shows some 'command line' examples which can used with the Perl script i.e. 2. Do I have to use/select an ebook output format?Code:
call ..\bin\do-ge 28700 --1200 --lrf "--LRmargins 2px" --keepzip --keephtm --pbfirsth1 --smaller i.e. gutebook.pl 28700 --1200 --lrf --LRmargins 2px --keepzip --keephtm --pbfirsth1 --smaller >gutebook.log ![]() No, but then GuteBook will only prepocess the .htm and will not create any ebooks nor setup the batch file to be used to re-generate the ebooks after re-editing the modified .htm. However, you can use the resulting .opf with, say, Mobipocket Creator and manually generate a .mobi and then feed that to calibre or any other mobi2... program (like Mobi2IMP). The choice is yours! 3. How do I download a Project Gutenberg Australia book?![]() Quick HOW-TO example: 4. Troubleshooting: Why doesn't GuteBook find or download my requested book?
Output results: Code:
Command Line ============ "C:\Program Files\GuteBook\bin\gutebook" --PGnum 0364A "http://gutenberg.net.au/ebooks04/0400561h.html" --epub --lrf --1200 --1150 --smallerfont --search "<h1>(<a name=.*?)</h1>" --replace "<h2>$1</h2>" --modi --modg GuteBook (version 0.4) Copyright (C) 2009 Nick Rapallo (nrapallo) Getting "0364A" HTML file from Project Gutenberg (Australia) Website Please Wait... Downloading . http://gutenberg.net.au/ebooks04/0400561h.html saved to C:\Program Files\GuteBook\0364A\0400561h.xhtml Renamed .xhtml to .htm . Book Title : The Robe (1942) Author : Lloyd C. Douglas eBook No. : 0400561h.html Language : English Released : July 2004 Cleaning "0364A" HTML... Wrote cleaned HTML "C:\Program Files\GuteBook\0364A\0400561h.htm" Press any key to continue . . . Etext-No.'s below 10,000 are sometimes problematic as many of the earlier etext no.'s don't follow the current/normal filenaming pattern of http://www.gutenberg.org/files/EXTEXTNO/EXTEXTNO-h.zip.
Let's say you've entered the number 7471 as the book listed on PG (it's a collection of short stories by P. G. Wodehouse). But GuteBook fails to find, download and convert the file. It's nothing you've done wrong, it's just that this ebook doesn't follow the normal filename pattern and needs to be overridden by placing the following link in the Input File box: http://www.gutenberg.org/dirs/etext05/2left10.zip . Just so that you know, I got that link from the Gutenberg ebook page for Etext-No. 7471 and copying the link to the .zip text (or html) version. Also, since that ebook is just available as text, you will need GutenMark (GUItenMark) installed and selected on the first page (see GUI screenshot). This ebook needs to be converted internally to html using GutenMark so that GuteBook can produce an ebook version. Try it again, as before, but just override the Input File, in this case. Last edited by nrapallo; 02-25-2010 at 05:01 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
book creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
Congratulations.
I tried your script extensively. It works really good and the eBooks resulting are more than adequate, even great compared to Manybook's efforts. As always, your GUI is well thought out and does work admirably. A shame that there is no auto creation of covers though, but I am not aware of any such freeware or opensource software. Apart from that and some coding idiocies coming from the Gutenberg coders (making Tocs with page numbers as links, for example) the generated eBooks are instantly readable and the different formats compare favorably. As you're aware yourself, there are still some problems with illustration sizes in ePub. For a first version, this is superb work! K to you! As an aside for Iphone/Ipod users: Use this to convert your Gutenberg books to ePub for Stanza. They will be perfectly formatted and you can autogenerate a cover. Best solution so far! Last edited by mtravellerh; 06-02-2009 at 05:08 AM. |
![]() |
![]() |
![]() |
#4 |
zeldinha zippy zeldissima
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
|
wow nick, you've done it again !! thank you so much for all your hard work on these brilliant tools which make ebook creation so much easier. karma to you for this !
![]() |
![]() |
![]() |
![]() |
#5 | ||
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
The Perl script could also allow one to use a generic cover page that is placed in, say, the install directory, for example. While this cover image would be external to the source .htm for .mobi ebooks, the other ebook formats would probable use a cover.htm page in addition before the source .htm. This way it' would be more compatible for all formats. Quote:
Also missing, but a worthy addition is to autogenerate a Table of Contents ("TOC") and place it at the end. However, most PG HTML versions already have a "Contents" section, so I'll wait and see if there is demand for such a TOC feature. Obviously, working with HTML as a starting point makes it easier to get all the "bells and whistles" we are used to seeing in hand-crafted ebooks created by those, like yourself, that do a marvellous job! In future, once the experimental nature of PG .mobi and .epub offerings become more standard, I can switch to using those as input instead of the HTML versions. ![]() Working with .txt may require much more "polishing" by hand. Currently, GutenMark transforms any .txt only ebooks into acceptable .htm ebooks. I may incorporate this ability withing the Perl script using gut.pl (or newgut.pl discussed here)! While GuteBook cannot be expected to properly detect and handle ALL PG quirks and idiosyncrasies, it makes a valiant attempt. I can improve the Perl script to "accomodate" any easily fixed quirk once it is made known which EText-No. PG ebook displays it. If you experience any formatting glitches, you can post your findings/fixes here and discuss/support their inclusion into future versions of GuteBook. The squeaky wheel... ![]() Last edited by nrapallo; 06-02-2009 at 11:08 PM. |
||
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,177
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
|
The biggest problem I've found with PG texts is that some of them have lost information contained in the original source book, e.g.,
Italics--many PG texts represent italics using UPPER CASE, which confounds real upper case with italics. For example, the following source will all be represented the same in the PG text (as HELLO WORLD): hello world Hello World HELLO WORLD No "smart quotes". Opening and closing quotes are represented by the same character, sometimes a double quote, and sometimes a single quote. When single quotes are used, this is confused with apostrophes, which are usually represented by single quotes. Confusion between hyphens and em-dashes. Hyphens should be represented by "-", and em-dashes by "--" in the PG text, which is easy to convert, but this is not always the case. Indented text. In my own private utilities I fix the above: convert to using opening/closing double quotes, correct apostrophe characters, correct use of italics etc. I do this automatically for many cases, and for the more difficult ones, I prompt the user for their resolution (what the correct character should be). I wonder what your thoughts are on this and how it applies to Gutebook? |
![]() |
![]() |
![]() |
#7 | |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
![]() And you are the "ideal" candidate to make use of GuteBook's simultaneous generation of .epub/.lrf and .imp versions! ![]() Let me know if you need any help setting this up (a Mobi2IMP and PDFRead installation procedure ripoff). |
|
![]() |
![]() |
![]() |
#8 | |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
![]() When dealing with .txt input, I've initially chosen to use GutenMark as my goal was to steer away from "reproducing" it's functionality, but rather concentrate on making the resulting .htm work/display better in dedicated ebook readers. So while I applaud your efforts, it's not my focus here. I did look at many PG .txt to .htm routines (gut.pl, newgut.pl, gtxt2html.pl and even gutenbrowser) and I know it is a tremendous undertaking so I just reserved the right to address this issue in future releases. ![]() Are your utilities/code freely-available? in Perl? ![]() Last edited by nrapallo; 06-02-2009 at 11:50 AM. Reason: typo |
|
![]() |
![]() |
![]() |
#9 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Just added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) to post #1 above.
Enjoy! |
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,177
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
|
|
![]() |
![]() |
![]() |
#11 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
|
![]() |
![]() |
![]() |
#12 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Added a "stripped-down" version with no Windows GUI and with no Windows Installer for those that don't want/like bloat.
Just download and unzip the GuteBook-noGUI-noInstaller.zip file in post #1 above and run the programs/batch files in the 'bin' directory. At least now, I can get JSWolf to look at (and hopefully try) it... ![]() ![]() |
![]() |
![]() |
![]() |
#13 |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
version 0.2
Updated GuteBook and Installer to version 0.2 in post #1 above.
REVISIONS: v0.2 - June 3, 2009 - removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>' - minor GUI / files cleanup |
![]() |
![]() |
![]() |
#14 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 101
Karma: 1940
Join Date: Apr 2009
Location: Denver, CO
Device: Libra H2O
|
If this is addressed elsewhere, my apologies, but is there anything like this for Mac users? I've been noticing a kinda profound lack of Mac apps for ebook creation/conversion.
I'm playing with Calibre, but curious if there's others out there. The tool looks great Nick! |
![]() |
![]() |
![]() |
#15 | |
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
I know that calibre and eBook Publisher work in Mac OS X, however the gutebook.pl script relies on the Windows SBPubX COM interface for .imp creation (my ebook reader's format), so it may also require the Windows eBook Publisher to be installed under wine. I've done this previously for my Mobi2IMP software for my Linux netbook using the same NSIS Installer. I'm sorry but I don't have access to any Mac to test this stuff on and/or debug it. Perhaps, someone else can help make this work under Mac OS X. I think it's doable! Can you at least execute the Perl script? |
|
![]() |
![]() |
![]() |
Tags |
ebook, front end, gui, gutenberg, maker, perl |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Project Gutenberg Australia | ballast | Deals and Resources (No Self-Promotion or Affiliate Links) | 9 | 07-31-2010 05:18 PM |
Project Gutenberg | levi_john | Workshop | 17 | 07-26-2010 07:02 PM |
Gutenberg Project DVD | Red Dragon | LRF | 0 | 02-14-2010 09:52 AM |
Magazines at Project Gutenberg | ficbot | Reading Recommendations | 3 | 11-10-2009 02:06 PM |
Project Gutenberg Goes Mobile | Robotech_Master | News | 1 | 02-06-2009 07:08 PM |