View Single Post
Old 12-03-2010, 01:27 PM   #70
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,893
Karma: 6120478
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by DiapDealer View Post
Yes. Unfortunately not a plugin, though. The process doesn't lend itself to a plugin-type scenario. Look for a folder that has the name 'Topaz' in it...and a readme that is similarly named.
But please be forewarned, Topaz ebooks are very very much like a poor man’s image only based PDF file. A Topaz Book is actually a binary encoded (via a dictionary) pseudo-xml description of glyphs [themselves described by xml descriptions of the contour points to make each glyph which may be one or more letters or even just a part of a letter] and their x,y locations on the page. It is then processed page by page with some added Amazon created OCRText info to enable at least some string search capability to be used in the Kindle. No true fonts per-se are embedded. Just pure glyph descriptions. So Topaz is really a description of the image of each page and not anything like a normal pml or html based ebook format.

To create the html, the tools will use the x,y position info and Amazon's OCRInfo, and the like, to recreate as a best guess what each page looked like but it will miss all italics, most non-heading bolding, etc. And it is only as good as the OCRText that Amazon includes.

When my "friend" decoded some Topaz books, some were nearly perfect, and others were quite horrible and needed many hours of hand cleanup to fix OCR errors (loaded in OpenOffice.org and spell checked in all languages used in the document and then exported to xhtml via Writer2xhtml plugin).

The nice thing is that the tools also create exact svg images of each page (embedded in xhtml) so using tools like ImageMajik, and Inkscape, you can create an image-only pdf file with some hand work. It will be huge though.

Not easy but worth it for some of the history books "my friend" wanted to read on his Sony reader. "He" simply could not find them anywhere else.
KevinH is online now   Reply With Quote