View Single Post
Old 04-27-2016, 06:05 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by theducks View Post
Paperport, the FREE OCR that came with my scanner. What you scan is what they try and OCR . 2 Col source is a pain. Lucky me, I rarely see it.
Side Note: Hmmmmm... I have been writing a Scan Tailor tutorial. Maybe I could toss in some semi-related extra pre/postprocessing in the tutorial.

Depending on how much time you waste on having to clean up the headers/footers in the OCR, perhaps it might be best to preprocess those images (with Scan Tailor), and then crop the headers/footers right out, so that the OCR program can just focus on the body text:

Original Scan: Click image for larger version

Name:	OriginalScan.png
Views:	567
Size:	65.6 KB
ID:	148279
Scan Tailor: Click image for larger version

Name:	ScanTailor.png
Views:	729
Size:	169.1 KB
ID:	148280
Cropping: Click image for larger version

Name:	Stripped.png
Views:	744
Size:	162.8 KB
ID:	148281

2 column source... I luckily rarely come across that either. Although I would probably do something similar (come up with Imagemagick way to split the pages in half). I may be contacting you via PM for some examples soon (or you could always contact me).
Tex2002ans is offline   Reply With Quote