Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 12-14-2015, 08:00 PM   #16
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
I'm entering the conversation quite late.
I let it run through the OCR, and use Notepad ++.
From within notepad, depending on how many scans you do, you can create macros to start removing errors.

I have about 5 type of older books (black text on a yellowed out paper). I noticed the scanner makes repetitive mistakes, like changing "I" to "L", or "are" to "ame" or something.

Notepad ++ has a very advanced "search and replace" option. Once I start reading the book on the top, and I find an error (say it wrote "plumtree" as "plumlree"), I will search and replace (*lree to *tree). That way, it will replace future 'plumlrees' as well as future 'applelrees', or 'pearlrees'.
Doing a few of the same books at a time, there you can learn your OCR's errors, and map em in a macro.
Write the macro, apply it on the book before you're even correcting it.
When you're starting with different sources on an OCR program, this method will not work very well, or not at all.
It mainly only works when you manually scan books from one and the same scanner, usually at the same resolutions.

For low resolution scans like above, I would recommend trying to download a text copy of the book, load it side by side with the picture, and manually apply corrections, or modifications on the text format; as the only alternative to correcting a rather lousy OCR conversion (which, no matter what software you get, the conversion probably will look bad regardless).

Last edited by ProDigit; 12-14-2015 at 08:03 PM.
ProDigit is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best practice to convert PDF to simple flowing text? Calibre error avid01 PDF 6 03-31-2017 03:47 AM
Best practice to convert framed HTML to e-reader readable format? avid01 Workshop 12 06-07-2015 06:03 AM
Convert EPUB to HTML Zip extra meta text meme Conversion 2 05-28-2012 01:34 PM


All times are GMT -4. The time now is 03:30 AM.


MobileRead.com is a privately owned, operated and funded community.