FYI. From what I've seen, this is how the trailing data flags are to be interpreted
lowest bit (0b1) = multibyte overlap characters of the form
<number of characters><char1><char2><char3>
where number of characters is encoded in the bottom two bits of a single byte (there can never be more than three overlap multibyte characters, since in UTF-8 the maximum character size is 4 bytes)
0b10 - Indexing trailing bytes
0b100 - Uncrossable breaks
Higher bits unknown
The trailing data for each bit >= 0b10 is encoded in the form
<data><size>
where size is a backward encoded vwi and gives the full size of that entry including the bytes to encode the size itself.
The trailing data at the end of the record is of the form:
<multibyte><entry1><entry2>...
Each entryN corresponds to a set bit in the extra data flags. The lowest order bit is the outermost entry and so on.
If you are in doubt about whether you got the trailing data correct, you should check that the size of the text after decompressing == the declared text size in the header (which is always 4096 bytes in all the MOBI files I've seen).
My inspect MOBI tool puts the bytes from each text record in the text/ sub directory.
|