MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 07-14-2014, 10:00 AM

Hi tkeo,

Still don't like the comparison against sys.maxint as that changes with machine. I simply want to check for one specific missing value 0xffffffff as we do with the start offset later on in KindleUnpack and many places in the header. I will fix that. If it is some other invalid value, I want to know that and let the program barf appropriately so we figure out how they have changed setting of CoverOffset. I will add my fix to the dump EXTH code as well. Also, do you have a specific testcase you use with that?

Thanks for catching the extra quotes bug in mobi_k8resc.py. I will remove the extra crs from prefs.py to keep it consistent with the other files.

Edit:

Here is how I am now handling the potentially missing CoverOffset issue (if that is what it even is). I am suspicious that someone has used an improperly written meta data editor and messed up the EXTH size fields somehow. If that is the case, I would rather we fail out as it will help us better detect where and when this is happening.

From mobi_header.py in parseMetaData(self)

Code:

        if self.hasExth:
            extheader=self.exth
            _length, num_items = struct.unpack('>LL', extheader[4:12])
            extheader = extheader[12:]
            pos = 0
            for _ in range(num_items):
                id, size = struct.unpack('>LL', extheader[pos:pos+8])
                content = extheader[pos + 8: pos + size]
                if id in MobiHeader.id_map_strings.keys():
                    name = MobiHeader.id_map_strings[id]
                    addValue(name, unicode(content, codec).encode('utf-8'))
                elif id in MobiHeader.id_map_values.keys():
                    name = MobiHeader.id_map_values[id]
                    if size == 9:
			value, = struct.unpack('B',content)
                        addValue(name, str(value))
                    elif size == 10:
                        value, = struct.unpack('>H',content)
                        addValue(name, str(value))
                    elif size == 12:
                        value, = struct.unpack('>L',content)
                        # handle special case of missing CoverOffset                                                            
                        if id != 201 or value != 0xffffffff:
                            addValue(name, str(value))
                    else:
                        print "Warning: Bad key, size, value combination detected in EXTH ", id, size, content.encode('hex')
                        addValue(name, content.encode('hex'))

Thanks,

KevinH

Quote:

Originally Posted by tkeo

Hi Kevin,

I have tested with 10 mobi files, 2 of which have HD images and 1 of which has no RESC. The splitted files are identical to ones generated by older mobi_split.py.

I have fixed a bug in taginfo_toxml() of mobi_k8resc.py and modified mobi_header.py.

I have changed to

508 : 'Unknown_Title_Furigana?_(508)',
517 : 'Unknown_Creator_Furigana?_(517)',
522 : 'Unknown_Publisher_Furigana?_(522)',

in dump_contexth(cpage, extheader).
Those in class MobiHeader are not changed.

I have modified this part too since int('0xffffffff') cannot convert to an long integer.

Code:

>>> int('0xffffffff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0xffffffff'
>>>

I attach a patch. Hopefully, it is the final patch!

BTW,
prefs.py has CRLF line ending instead of LF.

Take care,
tkeo