View Single Post
Old 01-17-2010, 01:28 AM   #318
labba
Member
labba has learned how to read e-bookslabba has learned how to read e-bookslabba has learned how to read e-bookslabba has learned how to read e-bookslabba has learned how to read e-bookslabba has learned how to read e-bookslabba has learned how to read e-books
 
Posts: 23
Karma: 752
Join Date: Dec 2009
Device: none
from DarkRevers Blog:

Quote:
Converting Topaz to HTML

This is experimental and it will probably not work for you but…

ALSO: Please do not use any of this to steal. Theft is wrong.

This is only meant to allow conversion of Topaz books
for other book readers you own.

Here are the steps:

1. First you must use the python scripts in topazscripts.zip to do the translation from Topaz to HTML

The files you should have after unzipping are:

cmbtc_dump.py – (author: cmbtc) unencrypts and dumps to files all of the sections, properly numbered and named

decode_meta.py – converts metadata0000.dat to human readable text

convert2xml.py – converts page*.dat, other*.dat, and glyphs*.dat files to their “pseudo” xml descriptions.

flatxml2html.py – converts a “flattened” xml description to html using the ocrtext and markup as its basis.

stylexml2css.py – converts stylesheet “flattened” xml from other0000.dat into css (as best it can – mainly supporting paragraph style classes)

genxml.py – main program to convert everything to xml

genhtml.py – main program to generate “book.html”

2. You must remove the DRM from the Topaz book and build a
directory of its contents using the following commands:

cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE

This should create a directory called “TARGETDIR” in your current directory.

It should have the following files in it:

metadata0000.dat – metadata info
other0000.dat – information used to create a style sheet
dict0000.dat – dictionary of words used to build page descriptions
page – directory filled with page*.dat files
glyphs – directory filled with glyphs*.dat files

3. You should convert the files in “TARGETDIR” to their xml descriptions
Please note, this python program uses “decode_meta.py” and “convert2xml.py” so don’t move them.

genxml.py TARGETDIR

4. Next attempt a conversion to html where “TARGETDIR” is the directory
that was created in step 2. Please note, this python program uses “decode_meta.py”, “convert2xml.py”, “flatxml2html.py”, and “stylexml2css.py” so don’t move them.

genhtml.py TARGETDIR

Once it completes:

You should have created the file “book.html” inside of TARGETDIR

You should also have created the directory xml inside of TARGETDIR
which has the full xml descriptions of the pages and glyphs for later
(better) conversion attempts.

You can’t post a zip on pastebin.com, so we really need someplace/someone to host these. If that is something you are willing to do, pm me on Mobileread and I will get the scripts to you.

One warning … this is not the best long-term solution because much of the layout is only really correct if drawn to the screen (as an svg). Until that solution exists, this should get you something that you can load into Sigil and clean up and make an ePub that you can then convert to other formats
labba is offline   Reply With Quote