View Full Version : Named entities or not?


alecE
07-20-2009, 02:17 AM
I expect this is an elementary noob question, but I've seen different opinions voiced on this:

When preparing html text prior to creating an epub, should I use named entities (“ æ eg) or am I OK to use the 'real' symbols? My first reaction was that I should use the named entities, then I saw a suggestion that, provided everything was utf-8, all would be well.

My context is that I'm slowly learning how to convert .txt format text into .epub files for reading on my 505 and I'm working towards a consistent editing process. At the moment I'm not expecting to produce any other format.

Thanks in advance :)

ilovejedd
07-20-2009, 02:46 AM
I use HTML source files so I prefer UTF-8 encoding. Not sure how you'll handle character set recognition if working from plain text, though.

Jellby
07-20-2009, 06:19 AM
I use a mixture. I encode the files in UTF-8, but I still use named entities for characters I cannot easily input with the keyboard or which may be difficult to distinguish with my preferred editor and font. I input "", "", "", "", etc., but "‘", "—", " "...

netseeker
07-20-2009, 07:07 AM
I prefer using named entities even with UTF-8. Why? Because for some entities like "<", ">" and "&" named (or numerical) entities are necessary anyway and i prefer using one method for all entities.

zelda_pinwheel
07-20-2009, 07:27 AM
i use named entities for all special characters (&mdash, &ldquo etc. but also &agrave, &oelig, and &eacute etc.), it's a habit from coding for the web. i have noticed that when creating an xhtml file in dreamweaver, if you write the special character in the "design" box, it will automatically be encoded with the entity in the code ; however if you are working with the epub dtd most special characters will not be automatically encoded, this may mean that it's not specified as necessary with that doctype. but, since i am wary of (bad) suprises i think it's safer to use the named entities even with the utf-8 encoding. however your question makes me realise i have not verified what the epub standard specifically says about this ; interesting question.

pepak
07-20-2009, 07:49 AM
Technically, it is best to use numeric entities (& #1234; ) because they are most compatible (you may not realize it, but named entities need to be defined in the document's DTD, which prevents you from using the same representation in XHTML and, say, plain XML). But in reality I still use named entities, mainly because they are readable in plain text - I mean, if I see & ldquo; I know what it represents, unlike & #8220;

Jellby
07-20-2009, 08:01 AM
I've said that before, but I also use a mix of named an numbered entities for quotes and apostrophes. I use &rsquo; for a curly right single quote, and & #8217; for a curly apostrophe. They are exactly the same character, but it's nice to have them different in the source files if I want to search&replace or something.

zelda_pinwheel
07-20-2009, 08:03 AM
I've said that before, but I also use a mix of named an numbered entities for quotes and apostrophes. I use &rsquo; for a curly right single quote, and & #8217; for a curly apostrophe. They are exactly the same character, but it's nice to have them different in the source files if I want to search&replace or something.

ah, that is a clever trick...

Ankh
07-20-2009, 09:39 AM
I've said that before, but I also use a mix of named an numbered entities for quotes and apostrophes. I use &rsquo; for a curly right single quote, and & #8217; for a curly apostrophe. They are exactly the same character, but it's nice to have them different in the source files if I want to search&replace or something.

The best possible thing would be to use xhtml quote tags: <q> </q>, then define the quote characters in css, the way it is intended to be. :( Such a solution works for nested quotes, too.

Sadly, this doesn't work for me in ADE/505.

Jellby
07-20-2009, 10:21 AM
The best possible thing would be to use xhtml quote tags: <q> </q>, then define the quote characters in css, the way it is intended to be. :( Such a solution works for nested quotes, too.

That would be using the "quotes" property, right? Unfortunately, it appears "quotes" is not supported in the current ePUB spec (http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html#Section3.3).

Another problem: How does it work with multi-paragraph quotes? I believe the usual English practice is to add an opening quote character before each new paragraph, while in Spanish it's the closing quote. And verses or letters inside a quoted text?

pepak
07-20-2009, 10:22 AM
The best possible thing would be to use xhtml quote tags: <q> </q>, then define the quote characters in css, the way it is intended to be. :( Such a solution works for nested quotes, too.
It wouldn't work for non-paired quotes, though, which is quite common in many american books:
"Some long paragraph spoken by A.
"A still talks.
"Even more talking from A.
"Finally A concludes his lengthy statement."

Ankh
07-20-2009, 11:29 AM
That would be using the "quotes" property, right? Unfortunately, it appears "quotes" is not supported in the current ePUB spec (http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html#Section3.3).

Darn! RTFM, Ankh, RTFM.

Thanks Jellby, although I still don't see the logic behind omission of that specific css property from ePub spec.

Another problem: How does it work with multi-paragraph quotes? I believe the usual English practice is to add an opening quote character before each new paragraph, while in Spanish it's the closing quote. And verses or letters inside a quoted text?

I would not mind treating that as an exception and revert to hard-coding quotes into the text. Those situations are rare, right?

pepak
07-20-2009, 11:43 AM
I would not mind treating that as an exception and revert to hard-coding quotes into the text. Those situations are rare, right?
Quite common with graphomaniac authors.

Jellby
07-20-2009, 12:09 PM
I would not mind treating that as an exception and revert to hard-coding quotes into the text. Those situations are rare, right?

Not that rare. It may not occur in every chapter, but it tends to happen at least a few times in every one of the books I've made.

I tried a solution with custom named entities (http://www.mobileread.com/forums/showthread.php?t=47195), but it doesn't seem to work as I expected.

alecE
07-20-2009, 04:41 PM
Thanks for all the responses - I *will* stick with named entities (and continue to specify utf8).
Neat trick re. the separation of curly right quote & apostrophe - hadn't thought of that so thanks again.
Maybe it's just the sort of books I read, but many of the books I've been playing with present the dreaded multi-paragraph un-matched quote problem (Buchan, Kipling, Maupassant just to name some recent examples). Sadly the solution of making that wretched character 'A' a non-person doesn't seem to be optimal.
Nested quotes - I've often encountered the solution of maintaining the outer quotes as proper double quotes, and then using single quotes for the inner section. So far I haven't encountered a triple-decker quote sandwich. (OK, I know, vast swathes of triply-nested quotes now sweeping in from the west...)

Jellby
07-21-2009, 05:36 AM
So far I haven't encountered a triple-decker quote sandwich. (OK, I know, vast swathes of triply-nested quotes now sweeping in from the west...)

I've encountered them, and I believe the common solution is just to alternatate the quote style, as in:

“Would you believe it?”, he said, “she said I'm ‘a little pathetic “snob”’.”

Valloric
07-21-2009, 11:13 AM
Huh. I must then be the only one who resolves all entities into UTF-8 characters, wherever possible.

Ankh
07-21-2009, 12:24 PM
Huh. I must then be the only one who resolves all entities into UTF-8 characters, wherever possible.

No, you are not.

I have scripts that automate creation or decompression of ePub's, and replacements back and forth between named entities and utf8 take place there. I run checks in my creation script, as well.

Everything is utf8 in the final ePub.