MobileRead Forums - View Single Post

nickredding · 07-31-2011, 03:28 PM

Quote:

Originally Posted by kovidgoyal

Nevermind, looking at the TBS bytes from that document, their structure is completely different from kindlegen 1.2 TBS entries, so you'd have to decode them from scratch, the info you'll need will all be present in the decompiled_nyt/ dir.

Not true. The TBS bytes generated by Kindlegen 1.1 and 1.2 are identical. I have attached my own parsing of them using a modified version of a python script called mobiunpack (also attached). I don't undertand the output from your debug code. For example, in NYT.MOBI your code seems to say the TBS for the first record are 80 0 80 80 (from tbs_indexing.txt).

Code:

******************** TBS Indexing (27 records) ********************

Record #1: Starts at: 0 Ends at: 4095
	Contains: 3 index entries (0 ends, 0 complete, 3 starts)
TBS bytes: 80 0 80 80
	Starts:
		Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 121, Size: 107660) [Periodical]
		Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 568, Size: 76568) [The Front Page]
		Index Entry: 3 (Parent index: 1, Depth: 2, Offset: 2968, Size: 13248) [Amid New Talks, Some Optimism on Debt Crisis]

TBS: 0 (0000)
Outermost index: 0
Unknown extra start bytes: {}
The section at the start of this record is: 0
First article in this record of section 0 (relative to its parent section): 0 [0 absolute index]
The section 0 has at most one article in this record

My parsing shows 86 80 02 A0 85, as in

Code:

    PACKED HTML Record[  0]  Base =         0h [        0 ]  Size =   7B0h [   1968 ]
**Unpacked HTML Record[  0]  0 - 4099   TBS =  86 80 02 A0 85
       TBS HTML Record       86 80 02 A0 85
Decode TBS HTML Record       Type 6 <first section article, ncx=idx+1>
                             20h(idx=2 flags=0) NCX[3] HTML = 2968 - 16215, parent=1, flags=6, flagdata=0