Quote:
Originally Posted by mb2u
All my prospective conversion are non-fiction.
|
Glorious!
If you are serious about OCRing and getting high quality work out there, I would not mind teaching everything I know. (I am free over AIM/YIM/MSN/Skype/email).
While you can OCR for your own personal benefit, the benefit does not outweigh the costs (I spend about 8-15 hours just to get a great EPUB, but just starting, you might be spending 40+ hours on a book).
In my opinion, you should try to tackle works that are in the public domain, or books that are released as CC (Creative Commons). After finishing your OCR, and making a clean EPUB, you can then post it on MobileRead/elsewhere so that the ENTIRE WORLD can benefit from your conversion (instead of just you).
Archive.org has scans of a massive amount of public domain books. Or if you are interested in some "training materials", I have a bunch of journal articles that need OCR (~13 pages each).
Tackling the easy/short stuff I believe would have built up my skills/familiarity with the tools way faster, and it definitely keeps the motivation up (makes you feel like you are actually ACCOMPLISHING SOMETHING).
When I first jumped in to OCR I decided it would be a good idea to tackle all the hard stuff first... I wish I didn't do that!
When I used to tackle these large books that were complex/way out of my league, I would spend an entire week on it and felt like I got nowhere!
Quote:
Originally Posted by mb2u
I know what you mean....it would destroy the flow of the story correcting errors in fiction. It would demolish it!
|
The few fiction books that I actually wanted to read (that were PDF only)... I pretty much just had to feed it through OCR, export, split chapters really fast, and run a few basic cleanup regex. Then I read through the book in Sigil and fixed the errors as I came across them while reading. Took forever, but nothing was spoiled.