MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 09-13-2011, 09:41 PM

Hi seibert,

Thanks! That helps. I can now decipher the TAGX and find the bitmaps that are used to encode the record type information. I can guess at the what each tag byte means but that is only a guess. Is there any place that documents the meaning of each tag value or did you have to reverse engineer them from the kindlegen program?

For the record, here is what we know/guess based on the work done so far:

Code:

Tag      Decimal  Meaning     
0x01    01          position in the file for the link destination
0x02    02          length / size
0x03    03          title/label offset into CTOC
0x04    04          depth/level of heading (0 = toplevel, 1 = one level down, etc)
0x05    05          class/kind offset into CTOC
0x15    21          parent record number
0x16    22          first child record number
0x17    23          last child record number

which maps exactly to what calibre uses in its indexer.py:

Code:

class IndexEntry(object):

    TAG_VALUES = {
            'offset': 1,
            'size': 2,
            'label_offset': 3,
            'depth': 4,
            'class_offset': 5,
            'secondary': 11,
            'parent_index': 21,
            'first_child_index': 22,
            'last_child_index': 23,
            'image_index': 69,
            'desc_offset': 70,
            'author_offset': 73,
    }

So I guess we will have to work with that. We can try to modify the code to use your TAGX parsing routine to get the tag values and bit masks and then use those to decipher the "type" entry.

Thanks,

Kevin

09-13-2011, 09:41 PM	#176
KevinH Sigil Developer Posts: 7,675 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi seibert, Thanks! That helps. I can now decipher the TAGX and find the bitmaps that are used to encode the record type information. I can guess at the what each tag byte means but that is only a guess. Is there any place that documents the meaning of each tag value or did you have to reverse engineer them from the kindlegen program? For the record, here is what we know/guess based on the work done so far: Code: Tag Decimal Meaning 0x01 01 position in the file for the link destination 0x02 02 length / size 0x03 03 title/label offset into CTOC 0x04 04 depth/level of heading (0 = toplevel, 1 = one level down, etc) 0x05 05 class/kind offset into CTOC 0x15 21 parent record number 0x16 22 first child record number 0x17 23 last child record number which maps exactly to what calibre uses in its indexer.py: Code: class IndexEntry(object): TAG_VALUES = { 'offset': 1, 'size': 2, 'label_offset': 3, 'depth': 4, 'class_offset': 5, 'secondary': 11, 'parent_index': 21, 'first_child_index': 22, 'last_child_index': 23, 'image_index': 69, 'desc_offset': 70, 'author_offset': 73, } So I guess we will have to work with that. We can try to modify the code to use your TAGX parsing routine to get the tag values and bit masks and then use those to decipher the "type" entry. Thanks, Kevin Last edited by KevinH; 09-13-2011 at 09:54 PM.