MobileRead Forums - View Single Post - no text extraction for pdf with images and OCR

fxp33 · 05-01-2013, 01:42 PM

Hi,

I tried to convert pdf files containing the recognised text inside the original images of the book: http://www.freidok.uni-freiburg.de/v...f_der_Zahl.pdf

As you can see, the pdf is made of pictures (images) but you can select the text inside, and even make a "copy all" and paste it in a text software.

If I convert the pdf to epub with --no-images, there is absolutely no text inside the epub.
If I convert with images, only images (reduced) are in the epub.

Is there a way to get the text of such pdf without the images of the pages ?

(calibre version 0.9.28; win XP sp3; adobe pdf 10.1.4)

Thanks for your help

François

05-01-2013, 01:42 PM	#1
fxp33 Addict Posts: 261 Karma: 110864 Join Date: Mar 2013 Location: Bordeaux, France Device: Kobo Glo, Aura HD, kindle paperwhite	no text extraction for pdf with images and OCR Hi, I tried to convert pdf files containing the recognised text inside the original images of the book: http://www.freidok.uni-freiburg.de/v...f_der_Zahl.pdf As you can see, the pdf is made of pictures (images) but you can select the text inside, and even make a "copy all" and paste it in a text software. If I convert the pdf to epub with --no-images, there is absolutely no text inside the epub. If I convert with images, only images (reduced) are in the epub. Is there a way to get the text of such pdf without the images of the pages ? (calibre version 0.9.28; win XP sp3; adobe pdf 10.1.4) Thanks for your help François