View Single Post
Old 05-14-2013, 01:26 PM   #37
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Quote:
Originally Posted by roger64 View Post
On writer2xhtml options, there is a mysterious line -for me- about named entities. Do you think it could be good to tick it? (see arrow on screenshot).
You can try it, but at least when I tested it did not help, even though it seems like it should.

First, thanks for the epub file. It clearly shows that the nbsp characters are in the file before being opened by Sigil and provided something to test with.

The issue appears (pending further checking) to be with Qt5 (not much of a surprise). Possibly a change from Qt4 to Qt5.

The file is encoded okay as UTF-8, and the nbsp characters are in the file stored as two bytes C2 A0 (or 302 240 decimal) as you can see when you examine the file with a hex editor. When Sigil opens the file it correctly identifies the file as UTF-8 and then asks Qt to convert the file to Unicode. Unfortunately for some reason, although it appears to convert everything else ok, Qt converts the 2 bytes to a standard space (20) instead of the nbsp character (A0). So the nbsp characters are removed before the rest of Sigil can see them and convert them to the   entity.

A short test shows that we can make a specific check for this 2 byte pair in UTF-8 files and map them to   before doing the conversion to unicode, thus preserving the nbsps in the file. Although it needs to be checked to make sure it doesn't break anything else, and to see if there is a better workaround.
meme is offline   Reply With Quote