I saw the topic of OCRing the books... that's not 4 hours indeed, more like 4 days, I have been doing it for some time with my printed books that can't be found in ebook.
1. Scan (+ destroy the book, as it has to be opened flat)
2. First complete read inside the OCR program, paying special attention to every spot it is unsure
3. Convert to openoffice
4. Spellchecking pass + macro pass (there are macros done to find errors during the OCR, there are hundreds of them)
5. Formatting (I use sfb->fb2), putting all tags everywhere
6. Reading again the result copy, clearing leftover errors
(7). If you can survive it - another reading pass, there are always 15-30 leftover errors after first 6
That it takes to produce good electronic copy, the rest is a joke
|