MobileRead Forums - View Single Post - Question about OCRd djvu and pdf with ABBYY

BranMakMorn · 10-25-2011, 08:29 AM

Hello everyone,

I've got a nagging problem which I didn't manage to solve browsing this section of the forum. So here it is: I have some books in .djvu format that I want to convert to .pdf PRESERVING THE OCR so that I can read and annotate them on iPad.

Now, I can of course open the djvu with ABBYY Finereader: it will scan the whole document and read the text, usually doing a very good job.

BUT. When I produce the OCRd .pdf, it will be a 'copy' of the original text, not the page-as-it-was. In other words: I don't want to have a 're-typed' copy of the book (also because ABBYY does an awful job with numbered footnotes), I want to keep the EXACT same looks of the printed book (font, spacings...everything).

I can achieve this if I simply 'print' the djvu file as a .pdf of course. But if I do this, I lose the searchable text, it will just be an image.

So the question would be: Is there any way to convert a djvu file, preserving BOTH ORCd text (searchability) AND general outlook?

Thank you!

10-25-2011, 08:29 AM	#1
BranMakMorn Enthusiast Posts: 30 Karma: 10 Join Date: Jan 2010 Device: none	Question about OCRd djvu and pdf with ABBYY Hello everyone, I've got a nagging problem which I didn't manage to solve browsing this section of the forum. So here it is: I have some books in .djvu format that I want to convert to .pdf PRESERVING THE OCR so that I can read and annotate them on iPad. Now, I can of course open the djvu with ABBYY Finereader: it will scan the whole document and read the text, usually doing a very good job. BUT. When I produce the OCRd .pdf, it will be a 'copy' of the original text, not the page-as-it-was. In other words: I don't want to have a 're-typed' copy of the book (also because ABBYY does an awful job with numbered footnotes), I want to keep the EXACT same looks of the printed book (font, spacings...everything). I can achieve this if I simply 'print' the djvu file as a .pdf of course. But if I do this, I lose the searchable text, it will just be an image. So the question would be: Is there any way to convert a djvu file, preserving BOTH ORCd text (searchability) AND general outlook? Thank you!