03-12-2010, 06:40 PM | #1 |
The Grand Mouse 高貴的老鼠
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Problems with multibyte data flags
Mobipocket files have a method of coping with multi-byte (UTF8) characters that get broken over a block boundary. They have a flag at the end of the block indicating that this has happened, and also supplying the missing bytes (that are at the start of the next block).
See https://wiki.mobileread.com/wiki/MOBI...racter_overlap Now, the problem I'm having is that I can't find a way to determine if these multibyte flags are present in a Mobipocket file or not. I used to think that a test of mobipocket version (>5) and header length (>=0xE8) was necessary and sufficient. But now I've come across a Mobipocket version 5 file (with header length 0xE8) that /does/ have the multibyte flags. And yet I have another file, also version 5 and header length 0xE8, that /doesn't/ have the multibyte flags (despite the bytes at offset 242 seeming to indicate that it should). Does anyone have any idea of how to reliably determine whether the data blocks in a Mobipocket file have or don't have the multibyte flags at the end of each block? In hope... |
03-13-2010, 06:13 AM | #2 | |
The Grand Mouse 高貴的老鼠
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
So it seems the test for whether the trailing data flags is valid is actually quote simple - version 5 or greater, and header length 0xE4 or greater. Although since I've never seen a version 5 Mobipocket file with a header length less than 0xE4, the latter condition might be redundant. I've updated the wiki to clarify the situation for anyone else fool enough to poke about in Mobipocket file innards. |
|
03-13-2010, 09:11 AM | #3 |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Paul,
Nice that you found it! Will you please add your fixes to our Decoder tool google source code site extraction routines? I will then use hg to sync the changes and include them. Thanks! KevinH |
03-13-2010, 09:44 AM | #4 |
The Grand Mouse 高貴的老鼠
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
03-13-2010, 09:49 AM | #5 |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
03-13-2010, 10:39 AM | #6 |
The Grand Mouse 高貴的老鼠
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Les Miserables - TOC / Chapter Flags? | gshipley | Amazon Kindle | 1 | 09-28-2009 10:07 PM |
Linux version of Calibre, problems with Meta Data | Nirf | Calibre | 6 | 04-24-2009 10:59 AM |
Topic flags | pilotbob | Feedback | 10 | 02-21-2009 07:51 AM |
Meta data problems | melhall | Sony Reader | 1 | 03-31-2008 10:58 PM |
Data loss | FuzzyGamer | Sony Reader | 4 | 06-25-2007 04:24 PM |