View Single Post
Old 09-13-2011, 09:41 PM   #176
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,675
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi seibert,

Thanks! That helps. I can now decipher the TAGX and find the bitmaps that are used to encode the record type information. I can guess at the what each tag byte means but that is only a guess. Is there any place that documents the meaning of each tag value or did you have to reverse engineer them from the kindlegen program?



For the record, here is what we know/guess based on the work done so far:

Code:
Tag      Decimal  Meaning     
0x01    01          position in the file for the link destination
0x02    02          length / size
0x03    03          title/label offset into CTOC
0x04    04          depth/level of heading (0 = toplevel, 1 = one level down, etc)
0x05    05          class/kind offset into CTOC
0x15    21          parent record number
0x16    22          first child record number
0x17    23          last child record number
which maps exactly to what calibre uses in its indexer.py:

Code:
class IndexEntry(object):

    TAG_VALUES = {
            'offset': 1,
            'size': 2,
            'label_offset': 3,
            'depth': 4,
            'class_offset': 5,
            'secondary': 11,
            'parent_index': 21,
            'first_child_index': 22,
            'last_child_index': 23,
            'image_index': 69,
            'desc_offset': 70,
            'author_offset': 73,
    }

So I guess we will have to work with that. We can try to modify the code to use your TAGX parsing routine to get the tag values and bit masks and then use those to decipher the "type" entry.

Thanks,

Kevin

Last edited by KevinH; 09-13-2011 at 09:54 PM.
KevinH is offline   Reply With Quote