View Single Post
Old 09-24-2011, 02:02 PM   #3
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by DaleDe View Post
It is possible to get the text out of a djvu document but you would lose all the formatting.

You can also OCR the pages.
Dale
One problem with the "text layer" in many DJVUs is that it contains mis-spellings and sometimes horrendous mangling of the text, caused by the original OCR. I've had some with entire pages of the text layer missing. Often o.k for the intended purpose of locating text in the main layer but needs massive work to get recover the "original" text.

Re-OCRing the pages might produce better results but that will depend on the quality of the images that make up the DJVU.

I have done a couple of these ( building FB2 and EPUB rather than PDF but the principle is the same) and I can say it is not a job for the faint-hearted - there is an awful lot of manual work needed.

BobC

Last edited by BobC; 09-24-2011 at 02:05 PM. Reason: Minor clarification
BobC is offline   Reply With Quote