View Single Post
Old 01-01-2010, 06:31 PM   #74
chorpler
Zealot
chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.chorpler has a complete set of Star Wars action figures.
 
Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
Quoting from my post on Darkreverser's blog:

Looking at the dictionary file for a popular astronomy book ("Death by Black Hole"), I think it almost looks like badly-OCRed text, featuring entries like “solar,yslen,” (presumably “solar system), “space,huttlc” (presumably “space shuttle”), and “rainhow” (clearly supposed to be “rainbow”). This could be a bad sign, since it might mean that Topaz books only contain the information of a scan and an OCR, not a real text format that would allow us to export it to another format.

These results do match the quality of this book, however, because this particular book displays very poorly, with large gaps in the letters themselves (like the book was scanned with the brightness setting too high, so the thinner parts of the letters were washed out). So if the dictionary file contains OCRed (and not even proofread) versions of the words on each page, well, this book isn’t going to be very exportable. If we could reconstruct the page images we’d be able to re-run our own OCR and proofread it, but what a pain.

The one thing that makes me wonder about this conclusion is the fact that you’re supposed to be able to search Topaz files, right? But on my Kindle you can only search via an index of all the books, and the Kindle for PC app doesn’t seem to have a way of searching at all that I can find. So I can’t test to see if searching for “solar,syslen,” actually comes up at the place that says “solar system” in the graphical text.

Anybody know what format the glyphs are stored in?
chorpler is offline   Reply With Quote