View Single Post
Old 05-16-2013, 06:24 PM   #49
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by meme View Post

The file is encoded okay as UTF-8, and the nbsp characters are in the file stored as two bytes C2 A0 (or 302 240 decimal) as you can see when you examine the file with a hex editor. When Sigil opens the file it correctly identifies the file as UTF-8 and then asks Qt to convert the file to Unicode. Unfortunately for some reason, although it appears to convert everything else ok, Qt converts the 2 bytes to a standard space (20) instead of the nbsp character (A0). So the nbsp characters are removed before the rest of Sigil can see them and convert them to the   entity.
@meme

Are you sure the nbsp is stored as C2 A0 - when I looked at it with a hex editor is showed as A0 00 (presumably swap the bytes for endian and read it as simply 00 A0 or 000 160 decimal). This would be correct -


From http://www.w3.org/TR/html4/sgml/entities.html :
Code:
<!ENTITY nbsp   CDATA " & # 160 ;" -- no-break space = non-breaking space, U+00A0 ISOnum -->
BobC

Last edited by BobC; 05-16-2013 at 06:43 PM.
BobC is offline   Reply With Quote