View Single Post
Old 12-25-2010, 06:13 PM   #16
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,558
Karma: 5703586
Join Date: Nov 2009
Device: many
Hi,

FYI: a Topaz Input plugin would be a one way street. There really is no way to convert any other xhtml based file to the Topaz file format. The inputs required are really scanning based (list of glyphs and paths to create each glyph, x,y positions of each glyph on the page (and the glyphs are not like font glyphs as they have no baselines), ocr info, page continuation info, dehyphenation info, fixed page format, etc,.

So I actually think it would be easier to create a .tpzZ input plugin that would take the archive with html, cover.jpeg, opf that I have already generated and handle things. The generated html is xhtml which can be input directly into an lxml tree without need for tidy or beautiful soup so it is very close to your internal calibre format as it stands right now.

It is almost as if we need a new file type extension based on zip called ".calibre" that represents Calibre's internal format and I could modify the plugin code I have now to write to that standard and calibre could output to that code as well when exporting to disk.

Kevin
KevinH is offline   Reply With Quote