Sigil, UTF-8 and the emdash
I have just run across a strange occurance with Sigil. I loaded a HTML file to Sigil and noticed all of the emdashs (—) had disappeared. The original file had 250 occurances of the emdash.
I went back to the original HTML file and changed the character set from Windows 1252: Western European to UTF-8 (which Sigil uses) and all of the emdashes disappeared. I then went back to Windows 1252: Western European and replaced all (—) with amp#8212; , converted back to UTF-8 and all emdashes re-appeared. I then loaded to Sigil and all emdashes were present. This appears to be a UTF-8 problem.
As a pre-process, all HTML files will have to be edited prior to loading to Sigil unless someone has come up with a work-around. Are there any other characters to watch for?
Curiouser and curiouser!!
Last edited by crutledge; 06-29-2010 at 03:28 PM.
|