View Single Post
Old 07-31-2011, 06:33 PM   #49
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,908
Karma: 5035037
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
FYI. From what I've seen, this is how the trailing data flags are to be interpreted

lowest bit (0b1) = multibyte overlap characters of the form

<number of characters><char1><char2><char3>

where number of characters is encoded in the bottom two bits of a single byte (there can never be more than three overlap multibyte characters, since in UTF-8 the maximum character size is 4 bytes)

0b10 - Indexing trailing bytes
0b100 - Uncrossable breaks
Higher bits unknown

The trailing data for each bit >= 0b10 is encoded in the form

<data><size>

where size is a backward encoded vwi and gives the full size of that entry including the bytes to encode the size itself.

The trailing data at the end of the record is of the form:

<multibyte><entry1><entry2>...

Each entryN corresponds to a set bit in the extra data flags. The lowest order bit is the outermost entry and so on.

If you are in doubt about whether you got the trailing data correct, you should check that the size of the text after decompressing == the declared text size in the header (which is always 4096 bytes in all the MOBI files I've seen).

My inspect MOBI tool puts the bytes from each text record in the text/ sub directory.
kovidgoyal is offline   Reply With Quote