Hi seibert,
Thanks! That helps. I can now decipher the TAGX and find the bitmaps that are used to encode the record type information. I can guess at the what each tag byte means but that is only a guess. Is there any place that documents the meaning of each tag value or did you have to reverse engineer them from the kindlegen program?
For the record, here is what we know/guess based on the work done so far:
Code:
Tag Decimal Meaning
0x01 01 position in the file for the link destination
0x02 02 length / size
0x03 03 title/label offset into CTOC
0x04 04 depth/level of heading (0 = toplevel, 1 = one level down, etc)
0x05 05 class/kind offset into CTOC
0x15 21 parent record number
0x16 22 first child record number
0x17 23 last child record number
which maps exactly to what calibre uses in its indexer.py:
Code:
class IndexEntry(object):
TAG_VALUES = {
'offset': 1,
'size': 2,
'label_offset': 3,
'depth': 4,
'class_offset': 5,
'secondary': 11,
'parent_index': 21,
'first_child_index': 22,
'last_child_index': 23,
'image_index': 69,
'desc_offset': 70,
'author_offset': 73,
}
So I guess we will have to work with that. We can try to modify the code to use your TAGX parsing routine to get the tag values and bit masks and then use those to decipher the "type" entry.
Thanks,
Kevin