Quote:
Originally Posted by brewt
|
is an example of the mangled 3-byte code (EFAC86) that has resulted from something in your chain mis-interpreting the 2-byte FB06 code for the st ligature and attempting to convert it. Hard to say if that's Word or Dreamweaver, but Word seems to produce html with the correct escape sequence for a private-use character. It seems calibre has somehow managed to separate this code back into the letters s and t, but can't do the same for the other mangled codes.
Ligatures (like swash caps, text figures and other typographic variants) are
not part of the UTF spec* and you can't rely on programs to recognise such font-specific alternative characters. If you want to use them, make sure they're embedded as explicit escape sequences from the start.
*[Edit]Unlike
useful stuff like Linear B (which died out around 1100B.C.) and 38 different types of arrow...

The lack of UTF codes for text figures is especially annoying.