View Single Post
Old 01-27-2012, 12:16 PM   #274
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,893
Karma: 6120478
Join Date: Nov 2009
Device: many
Hi Nick,

I am integrating your changes into my own version of mobi_unpack_update5 (a few minor updates to what DiapDealer had posted to increase robustness when no css is provided, no ncx exists, etc) and I can't figure out the following.

In your split version you use as mobi header offsets:

first_content_index = 192 (or 0xc0 hex)
last_content_index = 194 (or 0xc2 hex)

You never access first_content_index but you do access last_content_index via >H to find the lastimage as follows:

lastimage = getint(datain_rec0,last_content_index,'H')

Yet my updated mobi_unpack code which was based on testing kindlegen output both when no css is provided (so rawml need never be split since there are no flow pieces), and when multiple css sheets are provided (multiple flow pieces (or svg pieces)) makes use of the following:

# need to use the FDST record to find out how to properly unpack
# the rawML into pieces
# it is simply a table of start and end locations for each flow piece
self.fdst = 0xffffffff
self.fdst, = struct.unpack_from('>L', self.header, 0xc0)
self.fdstcnt, = struct.unpack_from('>L', self.header, 0xc4)
# if cnt is 1 or less, fdst section number can be garbage
if self.fdstcnt <= 1:
self.fdst = 0xffffffff
if self.fdst != 0xffffffff:
self.fdst += self.start


But *only* if this is inside a KF8 Modi Header:

mobi unpack code uses:

# Offset Format Meaning
# ------ ------ -------------
# 0xc0 >L FDST start
# 0xc4 >L Number of records inside FDST

So it appears to me that 0xc0 is either a variable length field in a structure that we have yet to find the a proper indicator for .... or ... its size and meaning is different inside older mobi headers and newer mobi headers.

older mobi header

# Offset Format Meaning
# ------ ------ -------------
# 0xc0 >H first_content_index
# 0xc2 >H last_content_index

kf8 mobi header

# Offset Format Meaning
# ------ ------ -------------
# 0xc0 >L FDST start
# 0xc4 >L Number of records inside FDST

Is this your understanding as well?

Thanks,

KevinH
KevinH is offline   Reply With Quote