View Single Post
Old 06-12-2010, 08:30 PM   #20
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Quote:
Originally Posted by brewt View Post
Code:
ям†
is an example of the mangled 3-byte code (EFAC86) that has resulted from something in your chain mis-interpreting the 2-byte FB06 code for the st ligature and attempting to convert it. Hard to say if that's Word or Dreamweaver, but Word seems to produce html with the correct escape sequence for a private-use character. It seems calibre has somehow managed to separate this code back into the letters s and t, but can't do the same for the other mangled codes.

Ligatures (like swash caps, text figures and other typographic variants) are not part of the UTF spec* and you can't rely on programs to recognise such font-specific alternative characters. If you want to use them, make sure they're embedded as explicit escape sequences from the start.

*[Edit]Unlike useful stuff like Linear B (which died out around 1100B.C.) and 38 different types of arrow... The lack of UTF codes for text figures is especially annoying.

Last edited by charleski; 06-12-2010 at 09:50 PM.
charleski is offline   Reply With Quote