Thread: PRS-500 New tool: BBeB Binder
View Single Post
Old 01-01-2007, 04:15 PM   #22
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by FangornUK
Nice work! Looks very promising.

You seem to be doing some Gutenberg specific detections and a simple clean-up for the HTML versions is page number stripping, I do that in gutlrf.pl like so:
$_ =~ s#<span class='pagenum'>.*</span>## ;
$_ =~ s#<span class=\"pagenum\">.*</span>## ;

I'll post more bug reports to the google code site.
Can you point me to a book on Gutenberg that has these page number spans? I was using the Adventures of Sherlock Holmes, but it doesn't have any.

BTW does Gutenberg have a recommended HTML format that you're aware of, or are they at the mercy of every submitters ideas of what good HTML is?

If I wind up doing a bunch of html cleanup then I'll probably implement it where it reads various cleanup parameters (maybe like the two you put above) from a data file so that users can add their own values.
cmumford is offline   Reply With Quote