MobileRead Forums - View Single Post

tomsem · 08-05-2010, 05:56 PM

There's a document called 'Unicode 5.1 Test Publication' which someone converted to ePub and which I converted to mobi. All of these Romanian characters display correctly on my K2. I also converted the sample referenced above and it looks fine.

But whether any given text works will depend on whether the unicode is properly normalized. "ă" can't be encoded, for example, using 'a' and the unicode combining mark for '˘'. Rather it should use the precomposed unicode character (in this example, Unicode 0103, or UTF8 C4 83). So Canonical equivalence is not good enough, they need to be composed canonically as well.

I love using the word 'canonical.'

08-05-2010, 05:56 PM	#9
tomsem Grand Sorcerer Posts: 6,464 Karma: 25996225 Join Date: Apr 2009 Location: USA Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3	There's a document called 'Unicode 5.1 Test Publication' which someone converted to ePub and which I converted to mobi. All of these Romanian characters display correctly on my K2. I also converted the sample referenced above and it looks fine. But whether any given text works will depend on whether the unicode is properly normalized. "ă" can't be encoded, for example, using 'a' and the unicode combining mark for '˘'. Rather it should use the precomposed unicode character (in this example, Unicode 0103, or UTF8 C4 83). So Canonical equivalence is not good enough, they need to be composed canonically as well. I love using the word 'canonical.' Last edited by tomsem; 08-05-2010 at 06:06 PM.