I guess that would be to use a sophisticated OCR engine to process the PDF. They do PDFs nowadays, too, besides scans. And the CR rate is excellent, is mostly their layout recognition that is going to help here. You can go to document formats from there, e.g. MS Office.
|