Quote:
Originally Posted by meme
The file is encoded okay as UTF-8, and the nbsp characters are in the file stored as two bytes C2 A0 (or 302 240 decimal) as you can see when you examine the file with a hex editor. When Sigil opens the file it correctly identifies the file as UTF-8 and then asks Qt to convert the file to Unicode. Unfortunately for some reason, although it appears to convert everything else ok, Qt converts the 2 bytes to a standard space (20) instead of the nbsp character (A0). So the nbsp characters are removed before the rest of Sigil can see them and convert them to the entity.
|
@meme
Are you sure the nbsp is stored as C2 A0 - when I looked at it with a hex editor is showed as A0 00 (presumably swap the bytes for endian and read it as simply 00 A0 or 000 160 decimal). This would be correct -
From
http://www.w3.org/TR/html4/sgml/entities.html :
Code:
<!ENTITY nbsp CDATA " & # 160 ;" -- no-break space = non-breaking space, U+00A0 ISOnum -->
BobC