View Single Post
Old 03-12-2010, 06:40 PM   #1
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 74,051
Karma: 315160596
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
Problems with multibyte data flags

Mobipocket files have a method of coping with multi-byte (UTF8) characters that get broken over a block boundary. They have a flag at the end of the block indicating that this has happened, and also supplying the missing bytes (that are at the start of the next block).

See

https://wiki.mobileread.com/wiki/MOBI...racter_overlap

Now, the problem I'm having is that I can't find a way to determine if these multibyte flags are present in a Mobipocket file or not. I used to think that a test of mobipocket version (>5) and header length (>=0xE8) was necessary and sufficient. But now I've come across a Mobipocket version 5 file (with header length 0xE8) that /does/ have the multibyte flags. And yet I have another file, also version 5 and header length 0xE8, that /doesn't/ have the multibyte flags (despite the bytes at offset 242 seeming to indicate that it should).

Does anyone have any idea of how to reliably determine whether the data blocks in a Mobipocket file have or don't have the multibyte flags at the end of each block?

In hope...
pdurrant is offline   Reply With Quote