MobileRead Forums - View Single Post

KevinH · 04-15-2011, 02:50 PM

Hi Kovid

I was basing that on the parsing done by mobiunpack.py to get the starting offset of each section. The difference in starting offsets determines the section length.

Since section 0 contains the extended header, it;s size is the difference in the starting positions of section 0 and section 1.

For my test case under Calibre this provides:

going to load section 0 now
loading section 0
before: 2912 and after: 3472

as the starting offset and the ending offsets. This provides a size of 3472-2912 = 560 bytes for the extended header (section 0)

For my test case under KindleGen this provides:

loading section 0
before: 3816 and after: 12484

as the starting and ending offsets. This provides a size of 8668 bytes.

Perhaps there is a bug in mobiunpack.py in how it does sections but if you actually open the KindleGen produced book in emacs, you can see the almost 8000 bytes of nulls right where it says it should be.

Here is the code snippet that does the sectioning in mobiunpack.py (for what it is worth).

Code:

class Sectionizer:
        def __init__(self, filename, perm):
                self.f = file(filename, perm)
                header = self.f.read(78)
                self.ident = header[0x3C:0x3C+8]
                self.num_sections, = struct.unpack_from('>H', header, 76)
                print "number of sections ", self.num_sections
                sections = self.f.read(self.num_sections*8)
                self.sections = struct.unpack_from('>%dL' % (self.num_sections*2), sections, 0)[::2] + (0xfffffff, )
                for z in xrange(self.num_sections):
                        print z, " ", self.sections[z]

        def loadSection(self, section):
                print "loading section ", section
                before, after = self.sections[section:section+2]
                print "before: ", before, " and after: ", after
                self.f.seek(before)
                return self.f.read(after - before)

04-15-2011, 02:50 PM	#16
KevinH Sigil Developer Posts: 8,906 Karma: 6120478 Join Date: Nov 2009 Device: many	Hi Kovid I was basing that on the parsing done by mobiunpack.py to get the starting offset of each section. The difference in starting offsets determines the section length. Since section 0 contains the extended header, it;s size is the difference in the starting positions of section 0 and section 1. For my test case under Calibre this provides: going to load section 0 now loading section 0 before: 2912 and after: 3472 as the starting offset and the ending offsets. This provides a size of 3472-2912 = 560 bytes for the extended header (section 0) For my test case under KindleGen this provides: loading section 0 before: 3816 and after: 12484 as the starting and ending offsets. This provides a size of 8668 bytes. Perhaps there is a bug in mobiunpack.py in how it does sections but if you actually open the KindleGen produced book in emacs, you can see the almost 8000 bytes of nulls right where it says it should be. Here is the code snippet that does the sectioning in mobiunpack.py (for what it is worth). Code: class Sectionizer: def __init__(self, filename, perm): self.f = file(filename, perm) header = self.f.read(78) self.ident = header[0x3C:0x3C+8] self.num_sections, = struct.unpack_from('>H', header, 76) print "number of sections ", self.num_sections sections = self.f.read(self.num_sections8) self.sections = struct.unpack_from('>%dL' % (self.num_sections2), sections, 0)[::2] + (0xfffffff, ) for z in xrange(self.num_sections): print z, " ", self.sections[z] def loadSection(self, section): print "loading section ", section before, after = self.sections[section:section+2] print "before: ", before, " and after: ", after self.f.seek(before) return self.f.read(after - before)