View Single Post
Old 03-26-2012, 04:00 PM   #2
KevinH
Guru
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 751
Karma: 285540
Join Date: Nov 2009
Device: many
Hi,

Check out the much simpler DumpMobiHeader.py:

http://www.mobileread.com/forums/sho...63&postcount=8

I have no idea why the only the text length would be useful since it is the uncompressed length, and may include css files and svg snippets (in a KF8 Mobi) and needs to be processed to get back to what is needed as input (for both older mobis and newer KF8 mobis). The actual text is stored in seprate sections with trailing byte sequences in other sections of the palm database file (a .mobi is a palmdatabase file)

If you examine DumpMobiHeader.py - and if you can read C/C++ you will have no problem with reading python - the only issue is that python uses whitespace indentation to indicate what is part of a loop, if statement, or any block - you will see the following:
Code:
    mobi6_header = {
            'compression_type'  : (0x00, '>H', 2),
            'fill0'             : (0x02, '>H', 2),
            'text_length'       : (0x04, '>L', 4),
            'text_records'      : (0x08, '>H', 2),
            'max_section_size'  : (0x0a, '>H', 2),
            'crypto_type'       : (0x0c, '>H', 2),
            'fill1'             : (0x0e, '>H', 2),
            'magic'             : (0x10, '4s', 4),
            'header_length'     : (0x14, '>L', 4),
            'type'              : (0x18, '>L', 4),
            ...
The 'magic' value is MOBI. So the easiest way to find the text_length assuming you want nothing else is to open the ebook in any editor and look for the first string 'MOBI" that comes *after* "BOOKMOBI" near the front of the ebook and then step back exactly 12 bytes to find the beginning of the text_length field which is stored as a BIG_ENDIAN sequence of bytes.

A better method would be to play around with the DumpMobiHeader.py and examine actual ebook files in any good hex editor to understand how it works.

You can also read our own MobileRead Wiki about the Mobi Format that will help.

Last edited by KevinH; 03-26-2012 at 04:20 PM.
KevinH is online now   Reply With Quote