MobileRead Forums - View Single Post - I have a 350mb PDF. How do I best convert this?

DNSB · 02-12-2019, 12:41 AM

Quote:

Originally Posted by Ghitulescu

I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why:

... so I have too a PDF coming out of scanner.
In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images.
Yet, calibre outputs a bunch of images, one per page.

So, again, why calibre does not use the "hidden text"?
Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?!

I have one commercially produced scan of a book from the 1870's* The images are not the greatest but they are what the original book looked like. The text plane is useful for searching but when I take a close look at it, it has a multitude of OCR errors which make it painful to read.

* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards.