View Single Post
Old 06-11-2012, 06:27 AM   #94
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by JSWolf View Post
There is no way to do a novel length conversion from PDF without errors. OCR can be better if you correct any issues the OCR flags as it does its thing. But I do agree that you need a full A/B comparison to make sure it's correct.
Yes. Absolutely right. We hit 99.7%, but that's about as good as it gets. WE correct the OCR issues during, and we do a full A/B comparison afterwards, AND we give the file to the client for proofing. Until Adobe decides to play ball on the html-export functions for PDF's, (or, hell, even Word, or XML), that's about as good as it will get, IMHO. Someone in another thread, somewhere, claimed that you could get good results using Acrobat Pro X to crop the headers/footers, export, and then do (something--don't recall) with Calibre, but....I'd have to see the materials to be convinced. That's nothing against Calibre; my question is the initial export from Acrobat Pro X. I've NEVER seen clean html from Pro X--at least, not HTML that wouldn't take longer to clean up in the first place than it takes to go the long route--scan & OCR.

Just my $.02.
Hitch
Hitch is offline   Reply With Quote