MobileRead Forums - View Single Post - Disabling OCR while converting a PDF to AZW3 (or, other kindle friendly formats)

dwig · 05-25-2016, 10:21 PM

As theducks said, calibre does not do OCR so it must be a case of the PDF having OCR text in addition to the bitmaps of the scanned pages and that calibre is using this text in its conversion.

I don't know it there is a way in calibre to force it to ignore the text and only use the bitmaps. There is a workaround. If you use a "virtual printer" utility that installs a printer driver that streams its data to a file in PDF format instead of to a hardware device the resulting PDF will be "flattened" with only a bitmap for each page. You would then import this "flattened" PDF into calibre to perform that PDF>AZW3 conversion.

I've used both Bullzip PDF Printer and PrimoPDF on Windows for similar tasks in the past. My issues had been with PDFs that used partially transparent objects that became opaque and obscured the text below them, but the technique should work as well for the OP's problem. I would think that the MacOSX's built in service for printing to PDF (found in its normal print dialog) should work as well.

05-25-2016, 10:21 PM	#3
dwig Wizard Posts: 1,613 Karma: 6718541 Join Date: Dec 2004 Location: Paradise (Key West, FL) Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...	As theducks said, calibre does not do OCR so it must be a case of the PDF having OCR text in addition to the bitmaps of the scanned pages and that calibre is using this text in its conversion. I don't know it there is a way in calibre to force it to ignore the text and only use the bitmaps. There is a workaround. If you use a "virtual printer" utility that installs a printer driver that streams its data to a file in PDF format instead of to a hardware device the resulting PDF will be "flattened" with only a bitmap for each page. You would then import this "flattened" PDF into calibre to perform that PDF>AZW3 conversion. I've used both Bullzip PDF Printer and PrimoPDF on Windows for similar tasks in the past. My issues had been with PDFs that used partially transparent objects that became opaque and obscured the text below them, but the technique should work as well for the OP's problem. I would think that the MacOSX's built in service for printing to PDF (found in its normal print dialog) should work as well.