Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2016, 02:11 PM   #1
noduskfever
Junior Member
noduskfever began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2016
Device: Kindle Paperwhite
Disabling OCR while converting a PDF to AZW3 (or, other kindle friendly formats)

I have some scanned pdf books which I want to read in my Kindle Paperwhite.

In Kindle PDF files has no thumbnails (covers are not shown in the library).

So, to get the thumbnail I wanted to covert them into AZW3 as this format supports bookcovers as thumbnails in Kindle's My Library. But when I tried to convert the PDF the OCR automatically tried to generate texts from that scanned PDF which I did not really like because they mess with the original formatting.

Now, is there anyway to disable the OCR while doing the conversion have have the images scanned from the PDF to keep in the AZW3 (or, any other kindle thumbnail friendly format)?

However, I have noticed that if the pdf book is not english, the OCR does not work, and it keeps the scanned pages in AZW3, and that's what I exactly want, with english books too!
noduskfever is offline   Reply With Quote
Old 05-25-2016, 04:05 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A

Calibre does NOT (can't) do OCR
theducks is offline   Reply With Quote
Advert
Old 05-25-2016, 10:21 PM   #3
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
As theducks said, calibre does not do OCR so it must be a case of the PDF having OCR text in addition to the bitmaps of the scanned pages and that calibre is using this text in its conversion.

I don't know it there is a way in calibre to force it to ignore the text and only use the bitmaps. There is a workaround. If you use a "virtual printer" utility that installs a printer driver that streams its data to a file in PDF format instead of to a hardware device the resulting PDF will be "flattened" with only a bitmap for each page. You would then import this "flattened" PDF into calibre to perform that PDF>AZW3 conversion.

I've used both Bullzip PDF Printer and PrimoPDF on Windows for similar tasks in the past. My issues had been with PDFs that used partially transparent objects that became opaque and obscured the text below them, but the technique should work as well for the OP's problem. I would think that the MacOSX's built in service for printing to PDF (found in its normal print dialog) should work as well.
dwig is offline   Reply With Quote
Old 05-25-2016, 10:22 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre does not do OCR -- if you are seeing text in the output, then t was present in the input.
kovidgoyal is offline   Reply With Quote
Old 05-26-2016, 04:42 AM   #5
noduskfever
Junior Member
noduskfever began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2016
Device: Kindle Paperwhite
Is there any way to clear the ocr text in the pdf files so that I can only have the scanned book pages in my azw3 files?
noduskfever is offline   Reply With Quote
Advert
Old 05-30-2016, 08:11 AM   #6
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
A quick solution within windows is printing the PDF as XPS (Microsoft XPS Document Writer, this driver is part of windows) and import the XPS file again as PDF with your external PDF converter.
Divingduck is offline   Reply With Quote
Reply

Tags
bookthumbnail, kindle, kindlepaperwhite, ocr, pdftoazw3


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem converting AZW3 to PDF. dragoth Conversion 4 04-24-2016 02:12 AM
Problem of math formula when converting azw3 to pdf maffia Conversion 1 05-15-2015 08:38 PM
Need help in converting music book azw3 to pdf noproblem Conversion 0 04-23-2015 11:56 AM
converting RPG books from PDF to AZW3, messes up images. Kyris Conversion 3 11-02-2012 02:35 PM
need help converting .pdf to other formats mgrunk Calibre 2 11-10-2010 08:19 PM


All times are GMT -4. The time now is 06:21 PM.


MobileRead.com is a privately owned, operated and funded community.