View Full Version : Non-standard characters


jbenny
11-18-2007, 12:20 AM
Since we've had some discussion on such character entities as em-dash and zero width non-joiners in several different threads, I thought I would make a list of the most often used HTML character entities, as listed on the W3.org site. I have attached both an HTML file and a Mobipocket file, so that others can use them for testing.

On my own Windows XP system, I see that Firefox displays them all. IE 6 misses a few of the Greek Letters and the first seven under Symbols and General Punctuation. FBReader misses almost the same characters as IE6, except that it does get the three spaces under General Punctuation and misses the next four characters. This is particularly interesting, as the exact same set of fonts is available to all of these programs on my PC.

I also viewed the Mobipocket file on a Palm emulater, with real Mobipocket reader software. On the Palm practically al the Greek letters are wrong, several of the Symbols and a few of the General Punctuation.

As for the oft-mentioned zero wide non-joiners, only Firefox and the Palm got this right. That's a shame, as zwnj would solve some of display problems with hyphenation, as has been suggested by others.

wallcraft
11-18-2007, 02:43 AM
How do you tell if the nbsp and the first seven under Symbols and General Punctuation are correct? Perhaps these need use cases to demonstrate they are working.

For FBReader, which characters get displayed depends on both the encoding and the font used. Under Windows, I got the same result as you did using the Bitstream Vera Sans font but switching to Lucida Sans Unicode got all the characters (except perhaps the punctuation). The font is selected under the Styles tab of the Preferences (Options) icon.

jbenny
11-18-2007, 03:43 AM
I just added some dashes around those space characters to see if they were different. On Firefox, the en-space, em-space and thin-space are all different widths. The nbsp is larger than a thin-space, but smaller than an en-space, so I assume that is correct. IE6 displays the en-space, em-space and thin-space the same - about the size of an en-space.

I figured that there were probably some fonts that would display the characters better than others. The main reason for posting the files was to make everyone aware that using all of these characters is still problematic and to let them test the characters on different configurations. Proper display depends on the operating system, reader software and font. It is good that most of these do display correctly, so we can feel relatively safe in using most of them.

Edit: I was mistaken about the zwnj and zwj in Firefox. They aren't correct. I get a vertical line and a vertical line with an x at the top. And I noticed that the soft-hyphen displays nothing in both FF and IE6.

jbenny
11-18-2007, 04:35 AM
Here is an updated character list. I have marked characters that don't display correctly in either FF, IE6 or FBReader in red. The characters that didn't display correctly in the Palm emulator with Mobipocket are marked in green. Note that in FF/IE6/FBReader I used the standard fonts Arial, Times New Roman and Courier New, as these are the most common on the PC. On the Palm emulator, I used whatever the default font was.

It would be interesting to see what results others get on different operating systems and using different reader software. In fact, it would be particularly interesting to see what the results are on dedicated readers.

If you would like to try this and report back, just list which characters did not display correctly. Please include the operating system you are using, the font you are using (let's stick to the default sans-serif, serif and monospaced fonts on your system) and the reading software. For dedicated readers, just the reader model and font should be sufficient.

If we could get a good cross-section of responses, it would be helpful to know which characters we can count on being supported in our ebooks across platforms.

jbenny
11-18-2007, 07:47 AM
Digital editions on Windows XP does a pretty good job with this list. The only Latin-1 character missed is the soft-hyphen. All the Greek characters are good. Among the Symbols, only overline doesn't display. Under General Punctuation, the first seven are no good. This was using the default serif font, which I presume is Times New Roman, as I specified no font in the epub document.

Maybe I should see if a PDF is any different.

jmurphy
11-18-2007, 03:50 PM
And I noticed that the soft-hyphen displays nothing in both FF and IE6.

Not sure about IE6, but in IE7 a soft-hyphen is not visible unless wrapping causes it to be the last character on a line.

If you grab the right-hand side of the browser and slowly move left, the soft-hypen will be visible when it becomes the last character before the line-wrap.

JMmurpy

DaleDe
11-18-2007, 05:56 PM
Not sure about IE6, but in IE7 a soft-hyphen is not visible unless wrapping causes it to be the last character on a line.

If you grab the right-hand side of the browser and slowly move left, the soft-hypen will be visible when it becomes the last character before the line-wrap.

JMmurpy

That is exactly how it should behave. If you can see it when not at the end of a line then it is not working correctly.

Dale