MobileRead Forums - View Single Post

Tex2002ans · 03-23-2014, 12:59 AM

Quote:

Originally Posted by Hamlet53

Over the past few months I have been digitizing many of my old books. I use a setup similar to what the fellow in the attached video uses to remove the binding and yield uniform size and smoothly cut pages.

Wow, that double-sided feed reader seems NICE. The one that I used was a single-sided, so we had to run the pages through the other direction as well, taking double the time.

If I was doing book scanning seriously, and on more of a mass scale, I would definitely invest more money initially for the double-sided scanners.

Quote:

Originally Posted by Hamlet53

I find that the larger the font the more accurate the OCR process is.

It doesn't really matter upon font size, more upon how "crisp" the image is (the DPI, how good the lighting was, how good the hardware is that is doing the scanning, how good the source material is, ...). A whole bunch of different variables at play.... and as I mentioned in one of the other posts, it can "look fine" according to the human eye, but go horribly wrong when OCRed.

Also keep in mind writing, highlighting, markings, will severely lower the speed/accuracy of the OCR (people who write in books MUST BE DESTROYED).

We also had a lot of discussion in this topic (about digitizing/OCRing math books): https://www.mobileread.com/forums/sho...d.php?t=228413

See my Post #16 showing off a few real-life examples of some of the worst markings I have run across: https://www.mobileread.com/forums/sho...2&postcount=16

Also, back to the different OCR programs... There is also a free OCR engine by Google called Tesseract: https://en.wikipedia.org/wiki/Tesseract_%28software%29