Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-17-2007, 11:20 PM   #1
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Non-standard characters

Since we've had some discussion on such character entities as em-dash and zero width non-joiners in several different threads, I thought I would make a list of the most often used HTML character entities, as listed on the W3.org site. I have attached both an HTML file and a Mobipocket file, so that others can use them for testing.

On my own Windows XP system, I see that Firefox displays them all. IE 6 misses a few of the Greek Letters and the first seven under Symbols and General Punctuation. FBReader misses almost the same characters as IE6, except that it does get the three spaces under General Punctuation and misses the next four characters. This is particularly interesting, as the exact same set of fonts is available to all of these programs on my PC.

I also viewed the Mobipocket file on a Palm emulater, with real Mobipocket reader software. On the Palm practically al the Greek letters are wrong, several of the Symbols and a few of the General Punctuation.

As for the oft-mentioned zero wide non-joiners, only Firefox and the Palm got this right. That's a shame, as zwnj would solve some of display problems with hyphenation, as has been suggested by others.
Attached Files
File Type: zip HTML-Characters.zip (8.1 KB, 598 views)

Last edited by jbenny; 11-17-2007 at 11:23 PM.
jbenny is offline   Reply With Quote
Old 11-18-2007, 01:43 AM   #2
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
How do you tell if the nbsp and the first seven under Symbols and General Punctuation are correct? Perhaps these need use cases to demonstrate they are working.

For FBReader, which characters get displayed depends on both the encoding and the font used. Under Windows, I got the same result as you did using the Bitstream Vera Sans font but switching to Lucida Sans Unicode got all the characters (except perhaps the punctuation). The font is selected under the Styles tab of the Preferences (Options) icon.
wallcraft is offline   Reply With Quote
Advert
Old 11-18-2007, 02:43 AM   #3
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
I just added some dashes around those space characters to see if they were different. On Firefox, the en-space, em-space and thin-space are all different widths. The nbsp is larger than a thin-space, but smaller than an en-space, so I assume that is correct. IE6 displays the en-space, em-space and thin-space the same - about the size of an en-space.

I figured that there were probably some fonts that would display the characters better than others. The main reason for posting the files was to make everyone aware that using all of these characters is still problematic and to let them test the characters on different configurations. Proper display depends on the operating system, reader software and font. It is good that most of these do display correctly, so we can feel relatively safe in using most of them.

Edit: I was mistaken about the zwnj and zwj in Firefox. They aren't correct. I get a vertical line and a vertical line with an x at the top. And I noticed that the soft-hyphen displays nothing in both FF and IE6.

Last edited by jbenny; 11-18-2007 at 02:48 AM.
jbenny is offline   Reply With Quote
Old 11-18-2007, 03:35 AM   #4
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Here is an updated character list. I have marked characters that don't display correctly in either FF, IE6 or FBReader in red. The characters that didn't display correctly in the Palm emulator with Mobipocket are marked in green. Note that in FF/IE6/FBReader I used the standard fonts Arial, Times New Roman and Courier New, as these are the most common on the PC. On the Palm emulator, I used whatever the default font was.

It would be interesting to see what results others get on different operating systems and using different reader software. In fact, it would be particularly interesting to see what the results are on dedicated readers.

If you would like to try this and report back, just list which characters did not display correctly. Please include the operating system you are using, the font you are using (let's stick to the default sans-serif, serif and monospaced fonts on your system) and the reading software. For dedicated readers, just the reader model and font should be sufficient.

If we could get a good cross-section of responses, it would be helpful to know which characters we can count on being supported in our ebooks across platforms.
Attached Files
File Type: zip HTML Character Entities.zip (3.3 KB, 519 views)

Last edited by jbenny; 11-18-2007 at 03:40 AM.
jbenny is offline   Reply With Quote
Old 11-18-2007, 06:47 AM   #5
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
One more data point

Digital editions on Windows XP does a pretty good job with this list. The only Latin-1 character missed is the soft-hyphen. All the Greek characters are good. Among the Symbols, only overline doesn't display. Under General Punctuation, the first seven are no good. This was using the default serif font, which I presume is Times New Roman, as I specified no font in the epub document.

Maybe I should see if a PDF is any different.
jbenny is offline   Reply With Quote
Advert
Old 11-18-2007, 02:50 PM   #6
jmurphy
Zealot
jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.
 
Posts: 110
Karma: 1133068
Join Date: Sep 2007
Device: ipaq
Quote:
Originally Posted by jbenny View Post
And I noticed that the soft-hyphen displays nothing in both FF and IE6.
Not sure about IE6, but in IE7 a soft-hyphen is not visible unless wrapping causes it to be the last character on a line.

If you grab the right-hand side of the browser and slowly move left, the soft-hypen will be visible when it becomes the last character before the line-wrap.

JMmurpy
jmurphy is offline   Reply With Quote
Old 11-18-2007, 04:56 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by jmurphy View Post
Not sure about IE6, but in IE7 a soft-hyphen is not visible unless wrapping causes it to be the last character on a line.

If you grab the right-hand side of the browser and slowly move left, the soft-hypen will be visible when it becomes the last character before the line-wrap.

JMmurpy
That is exactly how it should behave. If you can see it when not at the end of a line then it is not working correctly.

Dale
DaleDe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Standard CSS for Kobo? CazMar Sigil 5 09-24-2010 04:47 PM
Standard SDK across all ereaders? mike_bike_kite General Discussions 3 07-07-2010 09:16 AM
PDF is now an ISO Standard Bob Russell News 2 12-06-2007 01:00 PM


All times are GMT -4. The time now is 01:48 AM.


MobileRead.com is a privately owned, operated and funded community.