Why don't you try pdftotext, part of xpdf, and a standard application on linux (and probably others). It extracts whatever text is in the pdf and writes it to a plain text file avoiding the OCR/proofreading steps. You can even specify a crop area by giving it top/left coordinate and a width and height of the crop area to work on.
klaus
|