MobileRead Forums - View Single Post - Can you OCR the images inside of .pdf files?

grumbles · 10-30-2014, 09:07 PM

I didn't attempt to wade through all the previous noise in this thread but this is how I have converted pdf files to epubs.

I use PDFill (a free Windows program) to covert the PDF file to images (usually at 600 dpi).

I then run the images through ScanTailor to make the pages uniform, in particular I want all the headings and/or page numbers the same distance from the top and bottom edges. Where there are no headings/page numbers, I want make sure there is equivalent blank space.

I then use XNView to trim the headings and/or page numbers.

Then I run the OCR program on the images. I use Abby sprint that came with my scanner.

This works well and goes fairly quickly. I've used this on several PDF files I've downloaded from the Internet Archive.

10-30-2014, 09:07 PM	#40
grumbles Addict Posts: 238 Karma: 1500000 Join Date: Nov 2009 Location: Toronto Device: Pandigital Novel (Black), T-2 and 3, Nexus 7	I didn't attempt to wade through all the previous noise in this thread but this is how I have converted pdf files to epubs. I use PDFill (a free Windows program) to covert the PDF file to images (usually at 600 dpi). I then run the images through ScanTailor to make the pages uniform, in particular I want all the headings and/or page numbers the same distance from the top and bottom edges. Where there are no headings/page numbers, I want make sure there is equivalent blank space. I then use XNView to trim the headings and/or page numbers. Then I run the OCR program on the images. I use Abby sprint that came with my scanner. This works well and goes fairly quickly. I've used this on several PDF files I've downloaded from the Internet Archive.