MobileRead Forums - View Single Post - Heard back from Tor on why Topaz for some books...

Elfwreck · 03-18-2009, 11:43 AM

Quote:

Originally Posted by sirbruce

But I thought Topaz books were searchable. So they have to be actual character sets, custom fonts or not; not just images.

There's a form of scanned, searchable PDF where the actual text is invisible; what you're seeing is the scanned image of the letters. That way, the software doesn't have to figure out fonts and character spacings, which are often messed up by OCR programs that don't go to plain text.

Publishers seem to want ebooks to *look like pbooks*, and have missed the point that ebook readers don't need that and generally don't want it, that one of the main features of an ebook device is reflowable text & increasing the font to a size you're comfortable reading.

If Topaz uses scanned images of words, and "reflows" by zooming in on those, and basically cutting-and-pasting to rearrange them in the view window, that'd explain it being a glitchy format.

If that's not what Topaz does, there's no reason not to OCR the book into a USEFUL format and work from there. (If that is what Topaz does, they might be scrimping on time by not correcting the OCR errors--because if you never see the actual text, only the word-images, it's not likely to matter much, as long as over 95% are OCR'd correctly. And they usually are.)

And I suppose none of the publishing houses would consider going to the darknet, grabbing a scanned-and-OCR'd text copy of the book from some fan, and proofreading *that* instead of starting from scratch.