View Single Post
Old 07-14-2014, 10:00 AM   #912
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,893
Karma: 6120478
Join Date: Nov 2009
Device: many
Hi tkeo,

Still don't like the comparison against sys.maxint as that changes with machine. I simply want to check for one specific missing value 0xffffffff as we do with the start offset later on in KindleUnpack and many places in the header. I will fix that. If it is some other invalid value, I want to know that and let the program barf appropriately so we figure out how they have changed setting of CoverOffset. I will add my fix to the dump EXTH code as well. Also, do you have a specific testcase you use with that?

Thanks for catching the extra quotes bug in mobi_k8resc.py. I will remove the extra crs from prefs.py to keep it consistent with the other files.

Edit:

Here is how I am now handling the potentially missing CoverOffset issue (if that is what it even is). I am suspicious that someone has used an improperly written meta data editor and messed up the EXTH size fields somehow. If that is the case, I would rather we fail out as it will help us better detect where and when this is happening.

From mobi_header.py in parseMetaData(self)

Code:
        if self.hasExth:
            extheader=self.exth
            _length, num_items = struct.unpack('>LL', extheader[4:12])
            extheader = extheader[12:]
            pos = 0
            for _ in range(num_items):
                id, size = struct.unpack('>LL', extheader[pos:pos+8])
                content = extheader[pos + 8: pos + size]
                if id in MobiHeader.id_map_strings.keys():
                    name = MobiHeader.id_map_strings[id]
                    addValue(name, unicode(content, codec).encode('utf-8'))
                elif id in MobiHeader.id_map_values.keys():
                    name = MobiHeader.id_map_values[id]
                    if size == 9:
			value, = struct.unpack('B',content)
                        addValue(name, str(value))
                    elif size == 10:
                        value, = struct.unpack('>H',content)
                        addValue(name, str(value))
                    elif size == 12:
                        value, = struct.unpack('>L',content)
                        # handle special case of missing CoverOffset                                                            
                        if id != 201 or value != 0xffffffff:
                            addValue(name, str(value))
                    else:
                        print "Warning: Bad key, size, value combination detected in EXTH ", id, size, content.encode('hex')
                        addValue(name, content.encode('hex'))
Thanks,

KevinH

Quote:
Originally Posted by tkeo View Post
Hi Kevin,

I have tested with 10 mobi files, 2 of which have HD images and 1 of which has no RESC. The splitted files are identical to ones generated by older mobi_split.py.

I have fixed a bug in taginfo_toxml() of mobi_k8resc.py and modified mobi_header.py.

I have changed to
508 : 'Unknown_Title_Furigana?_(508)',
517 : 'Unknown_Creator_Furigana?_(517)',
522 : 'Unknown_Publisher_Furigana?_(522)',
in dump_contexth(cpage, extheader).
Those in class MobiHeader are not changed.


I have modified this part too since int('0xffffffff') cannot convert to an long integer.
Code:
>>> int('0xffffffff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0xffffffff'
>>>
I attach a patch. Hopefully, it is the final patch!

BTW,
prefs.py has CRLF line ending instead of LF.

Take care,
tkeo

Last edited by KevinH; 07-14-2014 at 12:03 PM.
KevinH is offline   Reply With Quote