Quote:
Originally Posted by JSWolf
There is no way to do a novel length conversion from PDF without errors. OCR can be better if you correct any issues the OCR flags as it does its thing. But I do agree that you need a full A/B comparison to make sure it's correct.
|
Yes. Absolutely right. We hit 99.7%, but that's about as good as it gets. WE correct the OCR issues during, and we do a full A/B comparison afterwards, AND we give the file to the client for proofing. Until Adobe decides to play ball on the html-export functions for PDF's, (or, hell, even Word, or XML), that's about as good as it will get, IMHO. Someone in another thread, somewhere, claimed that you could get good results using Acrobat Pro X to crop the headers/footers, export, and then do (something--don't recall) with Calibre, but....I'd have to see the materials to be convinced. That's nothing against Calibre; my question is the initial export from Acrobat Pro X. I've NEVER seen clean html from Pro X--at least, not HTML that wouldn't take longer to clean up in the first place than it takes to go the long route--scan & OCR.
Just my $.02.
Hitch