View Single Post
Old 05-25-2015, 06:30 AM   #3
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
The text layer of DJVU files from Archive.org is usually a raw OCR scan intended for locating text on the image layer.

My experience of it is that it is relatively poor as a source for an e-book without extensive editing by eyeball comparison with the image text.

On the books I have worked on this way I've imported the OCR'd text layer to LibreOffice Writer then edited that on one PC while viewing the DJVU Image on another. Only when happy with the edited text version have I converted it to EPUB for input to Calibre. Doing the sort of text manipulation needed to clean up such poor OCR scan text needs a proper Word Processor.

BobC
BobC is offline   Reply With Quote