|
|||||||
|
You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community today, you will have fewer ads, access to post topics, communicate privately with other members, respond to polls, upload content and access many other special features. If you have any problems with the registration process or your account login, please contact us. Hint: Don't have time to visit us daily? Subscribe to our main RSS feed to receive our frontpage posts at your convenience. |
| Workshop Scanning your first book? Need general conversion tips? Or just confused over the plethora of formats? Let us know and we'll do our best to help |
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
GuteBook - the Project Gutenberg eBook Maker/Front-end
GuteBook is a preprocessor for Project Gutenberg ("PG") and PG Australia HTML files (or alternatively the best .txt file available) so as to quickly and easily prepare one or many ebook versions for current ebook readers.
This project was created by Nick Rapallo (nrapallo) and was adapted from the gutlrf.pl code written by FangornUK, 10th Nov 2006 (and as recently modified May 2009). GuteBook (Windows GUI & Perl script) directly retrieves and converts PG or PG Australia HTML files specified by it's EText-No. or URL. PG Australia ebooks require the URL link to the HTML to also be specified in place of the Input File since there is no direct relationship between the PG Australia EText-No. and its URL. Once the HTML is available, the program fixes/filters many HTML items so as to properly create simultaneously many current ebook formats, including, .epub/.lrf/.mobi/.lit/.imp/.rb versions. Warning: the GuteBook-gui.exe and GuteBook.exe both access the internet and retrieve PG advanced search/External programs' webpages and individual .zip/.htm/txt files respectively. To accomplish these ebook creations, GuteBook relies heavily on external programs to facilitate the conversions. It uses calibre's Any2epub/lrf/mobi/lit as well as ETI's eBook Publisher. Afterwards, picky/advanced users can re-edit/tweak the resulting modified .htm and easily re-create the various ebook formats required via dos batch files. Now anyone can become a seasoned ebook creator with this easy to use program. So if your results ARE that good, consider contributing them to our EBook Upload forum (in the various ebook formats). Even mobileread.com's "elite" ebook creators may find it useful... ![]() PREREQUISITES:
INSTALLATION:
As always, Enjoy! EDIT: added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) produced in under a minute with its (non-verbose) output results: Code:
Command Line ============ "C:\Program Files\GuteBook\bin\gutebook" 28700 --1200 --1150 --lrf --epub --keepzip --keephtm --pbfirsth1 --imgsrc GuteBook (version 0.5) Copyright (C) 2009 Nick Rapallo (nrapallo) Getting "28700" HTML file from Project Gutenberg Website... Fetching 571.5KB... Extracting files... Book Title : Robin Hood Author : Paul Creswick Illustrator: NC Wyeth Released : May 6 2009 EBook 28700 Language : English Cleaning "28700" HTML... Wrote cleaned HTML "C:\Program Files\GuteBook\28700\28700-h\28700-h.htm" Code:
REVISIONS:
v0.5 - June 22, 2009
- For GUI users: if (blank) file called 'calibreold' (no .ext) exists in install directory,
then use v0.5 (stable) calibre instead of new v0.6 (beta/release) calibre;
non-GUI users can use the new switch '--calibreold' in lieu of a file called 'calibreold'!
- better allowed installation to different location than default "C:\Program Files".
- improved direct download of PG Australia ebooks. Allowed local cached copy to
be retained using --keepzip or --keephtm; avoids subsequent PGA website downloads.
- implemented creation of eReader .pdb when using calibre v0.6 (beta)
- fixed handling of single dash ("-") options
- improved print statements feedback
- better handling of PGA metadata within .htm
- better handling of important/necessary text after "THE END" but before PGA blurb.
- allowed existing .txt CHARSET to be used for generated .htm meta content-type
- better handling of --pbnofirst when <h1> already used as a pb tag
- misc. PGA .htm fixes for color and removed fixed fontsize for <p> and <table>
v0.4 - June 10, 2009
- added ability to directly download PG Australia ebooks using their EText-No. AND URL link
to the .html placed as the Input file.
For example, use: --PGnum 1547A & http://gutenberg.net.au/ebooks07/0700941h.html
Note that downloading .zip is fine, but .txt is not yet fully functional
- improved Custom Perl Search and Replace functionalilty. Still need to use "\ for any "
however due to dos limitation can't use ^ yet!
- minor code/html fixes.
v0.3 - June 4, 2009
- add "start" anchor when the PG preamble is retained
- remove any stray <br>'s from metadata.
- fix GUI options loading; now properly remembers the 'search' and 'replace' strings.
The user must ensure that any " or / are escaped by \ within those strings.
- simple PG title page added when --cover (GUI: 'Extract cover') specified (would be better to
take a snapshot of this as a "cover" image)
- option '--imgsrc' (GUI: 'Keep <img> src only') now removes "width" elements from
within preceding <div class=figcenter> which caused images not to be centered in .epub's
v0.2 - June 3, 2009
- removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>'
- minor GUI / files cleanup
v0.1 - June 2, 2009
- initial public release
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
Last edited by nrapallo; 09-06-2009 at 10:58 PM. Reason: added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) |
|
|
|
|
|
#2 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
FAQ
Reserved for FAQ/Tutorial.
1. Do I have to use the Windows GUI? No, that's optional, but it is easier than remembering and manually typing the required switches. The Perl script came first and is perfectly useable on it's own. The 'samples' directory in the GuteBook Install directory shows some 'command line' examples which can used with the Perl script i.e.2. Do I have to use/select an ebook output format? No, but then GuteBook will only prepocess the .htm and will not create any ebooks nor setup the batch file to be used to re-generate the ebooks after re-editing the modified .htm. However, you can use the resulting .opf with, say, Mobipocket Creator and manually generate a .mobi and then feed that to calibre or any other mobi2... program (like Mobi2IMP). The choice is yours!3. How do I download a Project Gutenberg Australia book? Quick HOW-TO example:
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
Last edited by nrapallo; 10-05-2009 at 11:00 PM. |
|
|
|
|
|
#3 |
|
book creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 4,917
Karma: 16363
Join Date: Oct 2008
Location: Luxembourg
Device: Cool-er, Cybook Gen 3, Ipod Touch, Acer Aspire One
|
Congratulations.
I tried your script extensively. It works really good and the eBooks resulting are more than adequate, even great compared to Manybook's efforts. As always, your GUI is well thought out and does work admirably. A shame that there is no auto creation of covers though, but I am not aware of any such freeware or opensource software. Apart from that and some coding idiocies coming from the Gutenberg coders (making Tocs with page numbers as links, for example) the generated eBooks are instantly readable and the different formats compare favorably. As you're aware yourself, there are still some problems with illustration sizes in ePub. For a first version, this is superb work! K to you! As an aside for Iphone/Ipod users: Use this to convert your Gutenberg books to ePub for Stanza. They will be perfectly formatted and you can autogenerate a cover. Best solution so far!
__________________
Deutsches MobileRead Forum rocks! Last edited by mtravellerh; 06-02-2009 at 05:08 AM. |
|
|
|
|
|
#4 |
|
WWHALD
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 22,830
Karma: 86918
Join Date: Dec 2007
Location: Paris, France
Device: ebookwise 1150
|
wow nick, you've done it again !! thank you so much for all your hard work on these brilliant tools which make ebook creation so much easier. karma to you for this !
__________________
I don't want none of that mischief on my eels! - pdurrant Hurricane Zelda of the Amazing Raining Frogs Join Adorable Madness -.-- --- ..- / -.-. .- -. -. --- - / .-. . ... .. ... - / - .... . / ..- -. ..- - - . .-. .- -... .-.. . / ... .. .-.. .-.. .. -. . ... ... "Resistance is futile, y'all." --DixieBorg by popular demand, we bring you the next exciting avatar in zelda's wardrobe, by the inimitable WetDogEared and his amazing dancing mice !! a round of applause, everybody ! |
|
|
|
|
|
#5 | ||
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
The Perl script could also allow one to use a generic cover page that is placed in, say, the install directory, for example. While this cover image would be external to the source .htm for .mobi ebooks, the other ebook formats would probable use a cover.htm page in addition before the source .htm. This way it' would be more compatible for all formats. Quote:
Also missing, but a worthy addition is to autogenerate a Table of Contents ("TOC") and place it at the end. However, most PG HTML versions already have a "Contents" section, so I'll wait and see if there is demand for such a TOC feature. Obviously, working with HTML as a starting point makes it easier to get all the "bells and whistles" we are used to seeing in hand-crafted ebooks created by those, like yourself, that do a marvellous job! In future, once the experimental nature of PG .mobi and .epub offerings become more standard, I can switch to using those as input instead of the HTML versions. ![]() Working with .txt may require much more "polishing" by hand. Currently, GutenMark transforms any .txt only ebooks into acceptable .htm ebooks. I may incorporate this ability withing the Perl script using gut.pl (or newgut.pl discussed here)! While GuteBook cannot be expected to properly detect and handle ALL PG quirks and idiosyncrasies, it makes a valiant attempt. I can improve the Perl script to "accomodate" any easily fixed quirk once it is made known which EText-No. PG ebook displays it. If you experience any formatting glitches, you can post your findings/fixes here and discuss/support their inclusion into future versions of GuteBook. The squeaky wheel...
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
Last edited by nrapallo; 06-02-2009 at 11:08 PM. |
||
|
|
|
|
|
#6 |
|
Groupie
![]() ![]() ![]() ![]() ![]()
Posts: 158
Karma: 442
Join Date: Sep 2008
Location: Back of the cupboard
Device: Sony Reader PRS-505
|
The biggest problem I've found with PG texts is that some of them have lost information contained in the original source book, e.g.,
Italics--many PG texts represent italics using UPPER CASE, which confounds real upper case with italics. For example, the following source will all be represented the same in the PG text (as HELLO WORLD): hello world Hello World HELLO WORLD No "smart quotes". Opening and closing quotes are represented by the same character, sometimes a double quote, and sometimes a single quote. When single quotes are used, this is confused with apostrophes, which are usually represented by single quotes. Confusion between hyphens and em-dashes. Hyphens should be represented by "-", and em-dashes by "--" in the PG text, which is easy to convert, but this is not always the case. Indented text. In my own private utilities I fix the above: convert to using opening/closing double quotes, correct apostrophe characters, correct use of italics etc. I do this automatically for many cases, and for the more difficult ones, I prompt the user for their resolution (what the correct character should be). I wonder what your thoughts are on this and how it applies to Gutebook? |
|
|
|
|
|
#7 | |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
And you are the "ideal" candidate to make use of GuteBook's simultaneous generation of .epub/.lrf and .imp versions! ![]() Let me know if you need any help setting this up (a Mobi2IMP and PDFRead installation procedure ripoff).
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
|
#8 | |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
![]() When dealing with .txt input, I've initially chosen to use GutenMark as my goal was to steer away from "reproducing" it's functionality, but rather concentrate on making the resulting .htm work/display better in dedicated ebook readers. So while I applaud your efforts, it's not my focus here. I did look at many PG .txt to .htm routines (gut.pl, newgut.pl, gtxt2html.pl and even gutenbrowser) and I know it is a tremendous undertaking so I just reserved the right to address this issue in future releases. ![]() Are your utilities/code freely-available? in Perl?
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
Last edited by nrapallo; 06-02-2009 at 11:50 AM. Reason: typo |
|
|
|
|
|
|
#9 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Just added some samples of EText-No. 28700 (Robin Hood by Paul Creswick) to post #1 above.
Enjoy!
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
#10 |
|
Groupie
![]() ![]() ![]() ![]() ![]()
Posts: 158
Karma: 442
Join Date: Sep 2008
Location: Back of the cupboard
Device: Sony Reader PRS-505
|
|
|
|
|
|
|
#11 | |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
|
#12 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Added a "stripped-down" version with no Windows GUI and with no Windows Installer for those that don't want/like bloat.
Just download and unzip the GuteBook-noGUI-noInstaller.zip file in post #1 above and run the programs/batch files in the 'bin' directory. At least now, I can get JSWolf to look at (and hopefully try) it... Been there, done that...
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
#13 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
version 0.2
Updated GuteBook and Installer to version 0.2 in post #1 above.
REVISIONS: v0.2 - June 3, 2009 - removed unwanted blank page at start in .lrf caused by use of tags '<pre></pre>' - minor GUI / files cleanup
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
#14 |
|
Member
![]()
Posts: 17
Karma: 10
Join Date: Apr 2009
Location: Denver, CO
Device: Kindle2
|
If this is addressed elsewhere, my apologies, but is there anything like this for Mac users? I've been noticing a kinda profound lack of Mac apps for ebook creation/conversion.
I'm playing with Calibre, but curious if there's others out there. The tool looks great Nick! |
|
|
|
|
|
#15 | |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,386
Karma: 24558
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
I know that calibre and eBook Publisher work in Mac OS X, however the gutebook.pl script relies on the Windows SBPubX COM interface for .imp creation (my ebook reader's format), so it may also require the Windows eBook Publisher to be installed under wine. I've done this previously for my Mobi2IMP software for my Linux netbook using the same NSIS Installer. I'm sorry but I don't have access to any Mac to test this stuff on and/or debug it. Perhaps, someone else can help make this work under Mac OS X. I think it's doable! Can you at least execute the Perl script?
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
![]() |
| Tags |
| ebook, front end, gui, gutenberg, maker, perl |
| Thread Tools | Search this Thread |
| Display Modes | |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| What's wrong with Project Gutenberg? | mtravellerh | News and Commentary | 13 | 04-22-2009 04:17 AM |
| Project Gutenberg | levi_john | Workshop | 16 | 04-08-2009 03:05 AM |
| Project Gutenberg going Mobi? | mtravellerh | News and Commentary | 92 | 04-07-2009 12:19 PM |
| Project Gutenberg on Kindle 1? | Astabeth | Amazon Kindle | 25 | 04-04-2009 06:24 PM |
| Project Gutenberg Goes Mobile | Robotech_Master | News and Commentary | 1 | 02-06-2009 07:08 PM |