View Single Post
Old 05-01-2013, 01:42 PM   #1
fxp33
Addict
fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.fxp33 figured out that Keyser Söze was the Kevin Spacey character in less than 20 minutes.
 
Posts: 261
Karma: 110864
Join Date: Mar 2013
Location: Bordeaux, France
Device: Kobo Glo, Aura HD, kindle paperwhite
no text extraction for pdf with images and OCR

Hi,

I tried to convert pdf files containing the recognised text inside the original images of the book: http://www.freidok.uni-freiburg.de/v...f_der_Zahl.pdf

As you can see, the pdf is made of pictures (images) but you can select the text inside, and even make a "copy all" and paste it in a text software.

If I convert the pdf to epub with --no-images, there is absolutely no text inside the epub.
If I convert with images, only images (reduced) are in the epub.

Is there a way to get the text of such pdf without the images of the pages ?

(calibre version 0.9.28; win XP sp3; adobe pdf 10.1.4)

Thanks for your help

François
fxp33 is offline   Reply With Quote