MobileRead Forums - View Single Post

DaleDe · 04-02-2008, 02:09 AM

Quote:

Originally Posted by jgray

This issue of quotes, mdashes and such turning into strange characters is why I use the numeric representations in my HTML markup. For example, for a left-double-curly-quote, & # 8220 ; and & # 8221 ; for the right. This is the only way to guarantee that non-ASCII characters will display properly on different systems.

This is especially true with XHTML, as the only such characters defined by name are & lt ;, & gt ; & amp ; (I think there are a few more, I just don't remember them right now). Not even the & nbsp ; is defined for XHTML.

Note that I had to insert spaces on each of those tags. The BBS software shows the character and not the tag that I entered, even if I wrap them in CODE tags.

The problem was the code was not in word or character coding, they were raw UTM-8. To understand this take the number 8220 from your example and convert it to binary values. In hexadecimal this would be 201C or 0010000000011100 and probably gets byte swapped. This attempts to get converted into 3 very different characters as shown in the problem. The original source possbile or even probably used characters like you do that was fine but it was compiled into Mobi internal format and then disassembled back into html and then converted to imp. The problem has been fixed by recognizing the UTM-8 character set.

Dale