Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 04-05-2011, 05:47 PM   #1
jlandahl
Junior Member
jlandahl began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Amazon Kindle
Epub format, B & N PubIt!, and HTML character entities

I've been working on learning epub format for some time and thought I'd mastered the basics when I finally got to the point where my ebooks could pass epubcheck 1.1. However, when I uploaded a book to Barnes & Noble's PubIt! recently, I noticed that certain 3-digit ISO 8859/1 character entities that went smoothly through epubcheck and rendered as intended in Calibre and on the Kindle were displayed as question marks in the Nook preview! The workaround was to replace them with the corresponding 4-digit Unicode character entities, but now I wonder which type to use for other devices like the iPad and the Sony Reader.

Five special characters are involved that I know of. These are the ones for smart or slanted single and double quotes and the one for the en dash. To my surprise, the one for the copyright symbol does render as intended in the Nook preview. In general, 3-digit numeric character entities can be relied on to be supported by most ebook readers, but this particular set was also an exception to this rule in Microsoft Reader.

Here's the list of special characters, the 3-digit character entities, and their 4-digit equivalents:
left single quote ‘ ‘
right single quote ’ ’
left double quote “ “
right double quote ” ”
en dash – –

Here's example of the HTML and how it renders.

“Stop in the name of the law – I recognize you, prisoner ‘ 94621’,“ he cried out!

Intended rendering:
“Stop in the name of the law – I recognize you, prisoner ‘94621’,” he cried out!

Nook preview:
?Stop in the name of the law ? I recognize you, prisoner ? 94621?,? he cried out!

The 3-digit character entity that does render properly in the Nook preview is ©, the copyright symbol.

By the way, one thing I can't rule out completely is that this rendering issue is due to a recent update to the Nook software, since the two ebooks I released earlier for the Nook now turn out on examination to have the ?'s too, and I'm surprised that I overlooked them during my initial Nook previewing, which I thought was pretty thorough.

Comments, anyone? Have I missed some subtlety of epub format?
jlandahl is offline   Reply With Quote
Old 04-06-2011, 04:46 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,081
Karma: 4571547
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by jlandahl View Post
I noticed that certain 3-digit ISO 8859/1 character entities that went smoothly through epubcheck and rendered as intended in Calibre and on the Kindle were displayed as question marks in the Nook preview! The workaround was to replace them with the corresponding 4-digit Unicode character entities, but now I wonder which type to use for other devices like the iPad and the Sony Reader.

The 3-digit character entity that does render properly in the Nook preview is ©, the copyright symbol.
Your 3-digit codes are probably referring to some Windows codepage encoding, while ePUB requires everything to be in Unicode. The placement of the quote marks in these two encodings is different, the copyright symbol happens to be in the same slot (A9 = 169).

Use Unicode references everywhere (or input the characters directly in UTF8) and it should be fine, otherwise you are asking for problems, even if it sometimes work (because you are lucky, mainly). Or use real entities: “ ” ‘ ’ – ©
Jellby is offline   Reply With Quote
Old 04-06-2011, 10:18 PM   #3
jlandahl
Junior Member
jlandahl began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Amazon Kindle
Thank you for the helpful advice. I got away from using named character entities like “ some time ago because they weren't always properly rendered, whereas the numeric ones were. For example, in my original post on this board, & didn't render properly and I had to use &.

It's interesting that epubcheck 1.1 doesn't catch non-Unicode character entities, and that, given that the 3-digit codes are Windows ones, they were never rendered properly by Microsoft Reader!

Last edited by jlandahl; 04-06-2011 at 11:24 PM.
jlandahl is offline   Reply With Quote
Old 04-07-2011, 04:38 AM   #4
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,081
Karma: 4571547
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by jlandahl View Post
Thank you for the helpful advice. I got away from using named character entities like “ some time ago because they weren't always properly rendered, whereas the numeric ones were. For example, in my original post on this board, & didn't render properly and I had to use &.
That's strange, I've never had any problem with named entities.

Quote:
It's interesting that epubcheck 1.1 doesn't catch non-Unicode character entities, and that, given that the 3-digit codes are Windows ones, they were never rendered properly by Microsoft Reader!
Well, they are not exactly non-Unicode, it's just that in Unicode that particular slot is not assigned to the character you want. For example ’ is #8217 in Unicode, and #146 in Windows-1258; but #146 in Unicode is just a control character (Private Use 2). If you use #146 in your code, epubcheck has no way of knowing whether you wanted to use the right single quote or the control character.
Jellby is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Why do html entities get replaced upon import? kentmatt Calibre 1 12-08-2010 12:21 PM
B&N Announces PubIt! self-publishing platform Steven Lyle Jordan News 87 10-25-2010 11:31 PM
epub or html to PDB format bhuvana786 ePub 5 07-10-2010 02:58 AM
Format Question html to ePub Fabe Calibre 3 04-21-2010 05:08 PM
Can I preserve entities when converting from html? (To avoid unicode on kindle) krunkster Calibre 1 04-07-2009 05:11 PM


All times are GMT -4. The time now is 06:38 AM.


MobileRead.com is a privately owned, operated and funded community.