![]() |
#1 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Yet Another Gutenberg Book converter
Following my odyssey to find a near perfect Gutenberg automatic conversion method for the Sony Reader...
This program gutlrf.pl (a Perl script) which, along with libprs500 (that library does the hard work), converts Gutenberg HTML books into BBeB LRF (Sony's eBook format) books for the Sony Reader. The process is designed so that no manual editing of the HTML files is required, it even downloads the files for you. It can also convert text based Gutenberg books with the help of Gutenmark. The gutlrf.pl script will retrieve, extract, clean, extract the author and book title and then call HTML2LRF (which does the hard work) to convert into an BBeB LRF file with support for Markup, Images and Contents. It tries to put new chapters onto new pages - which is usually based on the H2 HTML tag. Sometimes the Gutenberg ZIP files don't always contain the correct directory structure, gutlrf.pl will automatically fix this. There's full instructions inside the ZIP file. Download it from here Last edited by FangornUK; 05-29-2007 at 11:44 AM. |
![]() |
![]() |
![]() |
#2 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Here's a sample Gutenberg HTML book (19695) converted using these tools.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Also here's a Gutenberg text book (number 36) that was run through Gutenmark to generate a HTML file and then run through my scripts (only splitbook.pl).
Last edited by FangornUK; 11-10-2006 at 03:53 PM. |
![]() |
![]() |
![]() |
#4 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
|
The same question that I asked igorsk, and never got the answer: Is html2lrf able to process HTML books larger than about 600KB? I had pretty good luck with smaller books, but the larger ones seemed to crash html2lrf.
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
As a workaround try splitting the HTML into several files. All the HTML parsing code is inside LBParser.dll, I don't have anything to do with it
![]() |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#7 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
|
All right, great! Is it possible then to modify your script so it takes a direct link to the HTML, and not just a ZIP file?
![]() UPD: I see that the gutlrf.pl actually only gets and unzips the HTML, then I have to use the splitbook.pl to split it and feed it to the html2lrf. Sounds like everything is already there, no new changes required. ![]() Last edited by porkupan; 11-10-2006 at 03:44 PM. |
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
I've just added support to gutlrf.pl to support specifying a ZIP file (for an already downloaded Gutenberg file). gutlrf.pl does more than just download and unzip the files, it cleans the Gutenberg file (also adds title and author) in preparation for splitbook.pl & HTML2LRF.
Last edited by FangornUK; 11-13-2006 at 05:39 AM. |
![]() |
![]() |
![]() |
#10 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Update: Added support for an already unzipped Gutenberg HTML book.
Added option to gutlrf.pl to get it to automatically run splitbook.pl Strip leading spaces from author and title. |
![]() |
![]() |
![]() |
#11 |
Gadget Force®
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
|
I tried it but I get this with different files around 800Kb:
|
![]() |
![]() |
![]() |
#12 |
Member
![]() Posts: 14
Karma: 10
Join Date: Jun 2005
|
I tried debugging in Visual Studio 2005 the source code, but the project fails on a call to CreateNewBook(). I don't have the PDB files to debug why CreateNewBook fails, so if anyone has any suggestions, please post.
|
![]() |
![]() |
![]() |
#13 | |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Quote:
|
|
![]() |
![]() |
![]() |
#14 | |
Gadget Force®
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
|
Quote:
Thanks, I will give that a try! ![]() |
|
![]() |
![]() |
![]() |
#15 |
Addict
![]() ![]() ![]() ![]() Posts: 206
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Some more updates: Option to pass chapter split from gutlrf to splitbook. Better Chapter name extraction. Some bug fixes.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML from Project Gutenberg? | Rcartes | Sony Reader | 10 | 04-21-2009 07:26 PM |
html to bbeb converter ? | bugsbunny14 | Sony Reader | 10 | 11-07-2008 10:50 PM |
Book Processor - Anything to LRF and HTML converter | LittleDragon | Sony Reader | 11 | 05-13-2008 04:31 PM |
JafSoft AscToRTF - A GREAT Gutenberg Book/Ascii/RTF converter | Prince Bertram | Sony Reader | 11 | 11-25-2006 06:29 AM |
Mazarin - Gutenberg in HTML | Alexander Turcic | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 05-25-2004 03:11 AM |