Quote:
Originally Posted by theducks
Paperport, the FREE OCR that came with my scanner. What you scan is what they try and OCR . 2 Col source is a pain. Lucky me, I rarely see it.
|
Side Note: Hmmmmm... I have been writing a Scan Tailor tutorial. Maybe I could toss in some semi-related extra pre/postprocessing in the tutorial.
Depending on how much time you waste on having to clean up the headers/footers in the OCR, perhaps it might be best to preprocess those images (with Scan Tailor), and then crop the headers/footers right out, so that the OCR program can just focus on the body text:
Original Scan:

Scan Tailor:

Cropping:
2 column source... I luckily rarely come across that either. Although I would probably do something similar (come up with Imagemagick way to split the pages in half). I may be contacting you via PM for some examples soon (or you could always contact me).