MobileRead Forums - View Single Post - small PDFs becoming huge LRFs when converted

Timber · 08-24-2010, 09:34 AM

Quote:

Originally Posted by chaley

I think that Acrobat's OCR leaves the images, associating the text with the characters they come from in some overlay fashion. This is why you can sometimes search text in PDFs that are obviously images. There was a thread sometime back about Greek characters in documents that demonstrated this. When looking at the PDF, one saw greek, but ebooks made using the OCRed text had garbage in the same spot.

Try saving the OCRed PDF as text. That will get rid of the images. You could also try HTML.

Yep except that there are a lot of images in the book and I really want to keep the ones that are actually images in the original. I just want text to be treated as text so I dont end up with 50 MB+ .LRF files.

/sigh this sucks, cause even though the PDFs are big 5 MB to 10 or 12 MB, the LRFs are Huuuuuuuuuge.