|
|
Thread Tools | Search this Thread |
05-25-2016, 02:11 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: May 2016
Device: Kindle Paperwhite
|
Disabling OCR while converting a PDF to AZW3 (or, other kindle friendly formats)
I have some scanned pdf books which I want to read in my Kindle Paperwhite.
In Kindle PDF files has no thumbnails (covers are not shown in the library). So, to get the thumbnail I wanted to covert them into AZW3 as this format supports bookcovers as thumbnails in Kindle's My Library. But when I tried to convert the PDF the OCR automatically tried to generate texts from that scanned PDF which I did not really like because they mess with the original formatting. Now, is there anyway to disable the OCR while doing the conversion have have the images scanned from the PDF to keep in the AZW3 (or, any other kindle thumbnail friendly format)? However, I have noticed that if the pdf book is not english, the OCR does not work, and it keeps the scanned pages in AZW3, and that's what I exactly want, with english books too! |
05-25-2016, 04:05 PM | #2 |
Well trained by Cats
Posts: 29,792
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Calibre does NOT (can't) do OCR |
05-25-2016, 10:21 PM | #3 |
Wizard
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
As theducks said, calibre does not do OCR so it must be a case of the PDF having OCR text in addition to the bitmaps of the scanned pages and that calibre is using this text in its conversion.
I don't know it there is a way in calibre to force it to ignore the text and only use the bitmaps. There is a workaround. If you use a "virtual printer" utility that installs a printer driver that streams its data to a file in PDF format instead of to a hardware device the resulting PDF will be "flattened" with only a bitmap for each page. You would then import this "flattened" PDF into calibre to perform that PDF>AZW3 conversion. I've used both Bullzip PDF Printer and PrimoPDF on Windows for similar tasks in the past. My issues had been with PDFs that used partially transparent objects that became opaque and obscured the text below them, but the technique should work as well for the OP's problem. I would think that the MacOSX's built in service for printing to PDF (found in its normal print dialog) should work as well. |
05-25-2016, 10:22 PM | #4 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
calibre does not do OCR -- if you are seeing text in the output, then t was present in the input.
|
05-26-2016, 04:42 AM | #5 |
Junior Member
Posts: 2
Karma: 10
Join Date: May 2016
Device: Kindle Paperwhite
|
Is there any way to clear the ocr text in the pdf files so that I can only have the scanned book pages in my azw3 files?
|
05-30-2016, 08:11 AM | #6 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
A quick solution within windows is printing the PDF as XPS (Microsoft XPS Document Writer, this driver is part of windows) and import the XPS file again as PDF with your external PDF converter.
|
Tags |
bookthumbnail, kindle, kindlepaperwhite, ocr, pdftoazw3 |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem converting AZW3 to PDF. | dragoth | Conversion | 4 | 04-24-2016 02:12 AM |
Problem of math formula when converting azw3 to pdf | maffia | Conversion | 1 | 05-15-2015 08:38 PM |
Need help in converting music book azw3 to pdf | noproblem | Conversion | 0 | 04-23-2015 11:56 AM |
converting RPG books from PDF to AZW3, messes up images. | Kyris | Conversion | 3 | 11-02-2012 02:35 PM |
need help converting .pdf to other formats | mgrunk | Calibre | 2 | 11-10-2010 08:19 PM |