MobileRead Forums - View Single Post

DuckieTigger · 09-08-2018, 06:16 AM

Quote:

Originally Posted by sealbeater

No assumptions being made, pdfs are either one or the other and I don't disagree, you would have to do a 2 stage run on the pdf to get the best automatic result. However, I've never seen a pdf that had both.

No they are not either or. Even the PDF that contains text has full page images. You simply create them by printing the PDF into individual images for each page. OCR has a better chance to succeed than possibly horribly garbled text inside that won't tell you where the header is, for example.