I've tried most of these methods, but the best so far is to open the PDF in an OCR program and generate a new text file from that. Then manually delete any headers and footers, and fix any broken paragraphs. I haven't seen any common OCR problems yet, presumably because the text in a PDF will be perfectly straight and without any scanner or paper noise.
|