View Single Post
Old 04-24-2008, 08:20 PM   #7
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
I've tried GutenMark (and gut.pl) and the results are decent but too much hand-editing afterwards. The ManyBooks editions do a good job of Table of Contents but they don't do any typographic cleanup (dashes, ellipses, quotes, etc.)

I've been playing around with the list I've got above and eventually I'm going to update it (things like escaping '&' must occur before anything else). The process will always require human intervention (simply because each Gutenberg transcription was by a person) and an important part of the process is learning the original transcriber's style.

Two items that help, a reference on the MobileRead wiki to
Code:
<mbp:pagebreak/>
for layout control and Google Books archive of scanned books (for checking the original layout and typography).
cerement is offline   Reply With Quote