Quote:
Originally Posted by Timber
there are a lot of images in the book and I really want to keep the ones that are actually images in the original. I just want text to be treated as text so I dont end up with 50 MB+ .LRF files.
/sigh this sucks, cause even though the PDFs are big 5 MB to 10 or 12 MB, the LRFs are Huuuuuuuuuge.
|
I finally had a chance to try the OCR in Acrobat. As chaley says, it leaves multiple tiny images of the text, so the result as a pdf is highly readable - all you see are the original images of the text.
Highlighting and pasting into a txt document shows the OCR'd text only. In my tests, the results were pretty bad. It was only marginally readable as pure OCR'd text. Headings in an italicized different font were completely unreadable. Some words were split up, etc.
I suspect there is a site somewhere that will tell you how to remove all the text images, and replace them with the associated OCR'd true text. Ther muist be some way to do it. I hoped I'd find such a feature in Acrobat, but so far, no luck. Even if I found it, it would take a lot of work to get cleaned up.