04-05-2011, 05:47 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Amazon Kindle
|
Epub format, B & N PubIt!, and HTML character entities
I've been working on learning epub format for some time and thought I'd mastered the basics when I finally got to the point where my ebooks could pass epubcheck 1.1. However, when I uploaded a book to Barnes & Noble's PubIt! recently, I noticed that certain 3-digit ISO 8859/1 character entities that went smoothly through epubcheck and rendered as intended in Calibre and on the Kindle were displayed as question marks in the Nook preview! The workaround was to replace them with the corresponding 4-digit Unicode character entities, but now I wonder which type to use for other devices like the iPad and the Sony Reader.
Five special characters are involved that I know of. These are the ones for smart or slanted single and double quotes and the one for the en dash. To my surprise, the one for the copyright symbol does render as intended in the Nook preview. In general, 3-digit numeric character entities can be relied on to be supported by most ebook readers, but this particular set was also an exception to this rule in Microsoft Reader. Here's the list of special characters, the 3-digit character entities, and their 4-digit equivalents: left single quote ‘ right single quote ’ left double quote “ right double quote ” en dash – Here's example of the HTML and how it renders. Stop in the name of the law I recognize you, prisoner 94621, he cried out! Intended rendering: “Stop in the name of the law – I recognize you, prisoner ‘94621’,” he cried out! Nook preview: ?Stop in the name of the law ? I recognize you, prisoner ? 94621?,? he cried out! The 3-digit character entity that does render properly in the Nook preview is ©, the copyright symbol. By the way, one thing I can't rule out completely is that this rendering issue is due to a recent update to the Nook software, since the two ebooks I released earlier for the Nook now turn out on examination to have the ?'s too, and I'm surprised that I overlooked them during my initial Nook previewing, which I thought was pretty thorough. Comments, anyone? Have I missed some subtlety of epub format? |
04-06-2011, 04:46 AM | #2 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Use Unicode references everywhere (or input the characters directly in UTF8) and it should be fine, otherwise you are asking for problems, even if it sometimes work (because you are lucky, mainly). Or use real entities: “ ” ‘ ’ – © |
|
Advert | |
|
04-06-2011, 10:18 PM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Amazon Kindle
|
Thank you for the helpful advice. I got away from using named character entities like “ some time ago because they weren't always properly rendered, whereas the numeric ones were. For example, in my original post on this board, & didn't render properly and I had to use &.
It's interesting that epubcheck 1.1 doesn't catch non-Unicode character entities, and that, given that the 3-digit codes are Windows ones, they were never rendered properly by Microsoft Reader! Last edited by jlandahl; 04-06-2011 at 11:24 PM. |
04-07-2011, 04:38 AM | #4 | ||
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Quote:
|
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Why do html entities get replaced upon import? | kentmatt | Calibre | 1 | 12-08-2010 12:21 PM |
B&N Announces PubIt! self-publishing platform | Steven Lyle Jordan | News | 87 | 10-25-2010 11:31 PM |
epub or html to PDB format | bhuvana786 | ePub | 5 | 07-10-2010 02:58 AM |
Format Question html to ePub | Fabe | Calibre | 3 | 04-21-2010 05:08 PM |
Can I preserve entities when converting from html? (To avoid unicode on kindle) | krunkster | Calibre | 1 | 04-07-2009 05:11 PM |