View Single Post
Old 05-16-2009, 01:21 AM   #5
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by ahi View Post
My question is, am I better off converting existing eBooks into some editable format (like Project Gutenberg's EPUBs), do my fixing, then convert them back; or is it better to just work straight from the plaintext and make my own eBooks from scratch?
It really depends on the book. I have often found that, ironically, taking the paper book and scanning/ocring/fixing it myself is less work than taking a "finished" e-book and fixing that. In other cases, starting with plaintext (or stripping all tags from HTML) and fixing that can give a quite satisfactory result.

In any case, I very much doubt you will find a way to do the fixing fully automatically (by running a script) - at least I haven't, yet, and I have been trying to do it since I bought the reader last march. The closest I got is a semi-automatic process of checking each book manually and devising proper regular expressions to fix its quirks.
pepak is offline   Reply With Quote