MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 09-12-2011, 04:50 PM

Hi,

Okay I looked more at this index material. It appears the "type" information is key to understanding how to read in the indx information.

For example:

To correctly parse the indx entries, I had to do something like the following:

if type == 0x1f:
# handle next two variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
if type == 0xdf:
# handle next threee variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
pos, unk3 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 3 is ", unk3
pos, unk4 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 4 is ", unk4
if type == 0x3f:
# handle next threee variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
pos, unk3 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 3 is ", unk3

and then there is no need to look for or skip 0x80 values.

Also the count is not the same as the number of entries in the CTOC.

From my set of ebooks, the CTOC data always ends with '\0\0' double null bytes and it has variable length.

So I have attached a mobiunpack_test.py program that modifies things to work with a real amazon mobi ebook (as opposed to calibre generated ones).

Perhaps this might help others trying to track things down.

I am going to try and figure out what each of these unknowns actually means.

Hope this helps,

KevinH

09-12-2011, 04:50 PM	#164
KevinH Sigil Developer Posts: 7,654 Karma: 5433388 Join Date: Nov 2009 Device: many	index support Hi, Okay I looked more at this index material. It appears the "type" information is key to understanding how to read in the indx information. For example: To correctly parse the indx entries, I had to do something like the following: if type == 0x1f: # handle next two variable width unknowns pos, unk1 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 1 is ", unk1 pos, unk2 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 2 is ", unk2 if type == 0xdf: # handle next threee variable width unknowns pos, unk1 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 1 is ", unk1 pos, unk2 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 2 is ", unk2 pos, unk3 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 3 is ", unk3 pos, unk4 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 4 is ", unk4 if type == 0x3f: # handle next threee variable width unknowns pos, unk1 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 1 is ", unk1 pos, unk2 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 2 is ", unk2 pos, unk3 = getVariableWidthValue(navdata,offset) offset += pos print "unknown 3 is ", unk3 and then there is no need to look for or skip 0x80 values. Also the count is not the same as the number of entries in the CTOC. From my set of ebooks, the CTOC data always ends with '\0\0' double null bytes and it has variable length. So I have attached a mobiunpack_test.py program that modifies things to work with a real amazon mobi ebook (as opposed to calibre generated ones). Perhaps this might help others trying to track things down. I am going to try and figure out what each of these unknowns actually means. Hope this helps, KevinH Last edited by KevinH; 09-15-2011 at 06:55 PM.