Quote:
Originally Posted by Begemot
OP here, I resorted to using export Text in WinDJView.
This gets you a text dump with no formatting whatsoever. For my Libre it works well enough, but in general, this procedure is suboptimal.
Most DJVU files do seem to have a text layer (unless there is some on the fly OCR happening when you select an area on the page, which seems unlikely).
Thus, there must be a way(at least theoretically until someone writes a converter) to preserve the formatting in the text layer.
|
I can assure you that the text layer is just that - text; it's purpose is simply to provide the search capability. There is no formatting and in many books there are OCR "mis-reads".
If you want to understand DJVUs then you need to get the spec and study it. I've done quite a bit of work with adding TOCs to existing DJVUs and have converted a couple of books to FB2 - this involves manually proof-reading and correcting the dumped text then formatting it to match the original (italics, bold etc).
Don't expect too much out of what is a by-product of the search function.
BobC