![]() |
#16 |
Karmaniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
I'm entering the conversation quite late.
I let it run through the OCR, and use Notepad ++. From within notepad, depending on how many scans you do, you can create macros to start removing errors. I have about 5 type of older books (black text on a yellowed out paper). I noticed the scanner makes repetitive mistakes, like changing "I" to "L", or "are" to "ame" or something. Notepad ++ has a very advanced "search and replace" option. Once I start reading the book on the top, and I find an error (say it wrote "plumtree" as "plumlree"), I will search and replace (*lree to *tree). That way, it will replace future 'plumlrees' as well as future 'applelrees', or 'pearlrees'. Doing a few of the same books at a time, there you can learn your OCR's errors, and map em in a macro. Write the macro, apply it on the book before you're even correcting it. When you're starting with different sources on an OCR program, this method will not work very well, or not at all. It mainly only works when you manually scan books from one and the same scanner, usually at the same resolutions. For low resolution scans like above, I would recommend trying to download a text copy of the book, load it side by side with the picture, and manually apply corrections, or modifications on the text format; as the only alternative to correcting a rather lousy OCR conversion (which, no matter what software you get, the conversion probably will look bad regardless). Last edited by ProDigit; 12-14-2015 at 08:03 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best practice to convert PDF to simple flowing text? Calibre error | avid01 | 6 | 03-31-2017 03:47 AM | |
Best practice to convert framed HTML to e-reader readable format? | avid01 | Workshop | 12 | 06-07-2015 06:03 AM |
Convert EPUB to HTML Zip extra meta text | meme | Conversion | 2 | 05-28-2012 01:34 PM |