Quote:
Originally Posted by DaleDe
It is possible to get the text out of a djvu document but you would lose all the formatting.
You can also OCR the pages.
Dale
|
One problem with the "text layer" in many DJVUs is that it contains mis-spellings and sometimes horrendous mangling of the text, caused by the original OCR. I've had some with entire pages of the text layer missing. Often o.k for the intended purpose of locating text in the main layer but needs massive work to get recover the "original" text.
Re-OCRing the pages might produce better results but that will depend on the quality of the images that make up the DJVU.
I have done a couple of these ( building FB2 and EPUB rather than PDF but the principle is the same) and I can say it is not a job for the faint-hearted - there is an awful lot of manual work needed.
BobC