Thread: OCR engine
View Single Post
Old 03-23-2014, 12:59 AM   #21
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Hamlet53 View Post
Over the past few months I have been digitizing many of my old books. I use a setup similar to what the fellow in the attached video uses to remove the binding and yield uniform size and smoothly cut pages.
Wow, that double-sided feed reader seems NICE. The one that I used was a single-sided, so we had to run the pages through the other direction as well, taking double the time.

If I was doing book scanning seriously, and on more of a mass scale, I would definitely invest more money initially for the double-sided scanners.

Quote:
Originally Posted by Hamlet53 View Post
I find that the larger the font the more accurate the OCR process is.
It doesn't really matter upon font size, more upon how "crisp" the image is (the DPI, how good the lighting was, how good the hardware is that is doing the scanning, how good the source material is, ...). A whole bunch of different variables at play.... and as I mentioned in one of the other posts, it can "look fine" according to the human eye, but go horribly wrong when OCRed.

Also keep in mind writing, highlighting, markings, will severely lower the speed/accuracy of the OCR (people who write in books MUST BE DESTROYED).

We also had a lot of discussion in this topic (about digitizing/OCRing math books): https://www.mobileread.com/forums/sho...d.php?t=228413

See my Post #16 showing off a few real-life examples of some of the worst markings I have run across: https://www.mobileread.com/forums/sho...2&postcount=16

Also, back to the different OCR programs... There is also a free OCR engine by Google called Tesseract: https://en.wikipedia.org/wiki/Tesseract_%28software%29
Tex2002ans is offline   Reply With Quote