View Single Post
Old 08-09-2019, 01:04 PM   #3
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by fredthefork View Post
Hello! I'm new to this so please forgive me if this is basic knowledge.

I have a PDF file which is OCRed. I would like to convert it to epub. ...
What am I missing here? This can't be so difficult, - can it?
Yes.

One, "cropping" tools like Briss don't delete anything. They just set a new page size for viewing. The old data is still there; it's just off the page and out of view.

Two, the PDF was OCRd before it was cropped. The headers and similar "junk" is still in the text layer from the OCR process and still "visible" to the format converter so it ends up in the ePub.

You might be more successful if you "crop" the PDF first and then to the OCR. This might prevent the OCR process from "seeing" the parts that were trimmed.
dwig is offline   Reply With Quote