12-13-2012, 10:56 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
accents and entities in an epub
Currently I am finishing an index and ebook of a scholarly work -- seven languages, 467 footnotes, etc. The only remaining item prior to customer acceptance is to get the accented characters in French and German showing correctly.
My standard opening for XML files is: <?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> This standard opening leaves all of the É and ô, etc. showing in the text. I have tried following up the DOCTPE lines above with <!ENTITY HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> Flight Crew and EPubCheck are very unhappy with this. Any suggestions? |
12-13-2012, 12:54 PM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
If your file is encoded in UTF (so not only the declaration), it should not be a problem. Are you saying that the HTML entities remain in the rendered text or in the code? The second is no issue, the first one quite peculiar.
Also don't touch the DOCTYPE. |
Advert | |
|
12-13-2012, 02:36 PM | #3 |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
The file is entirely printable ASCII characters 0x20 through 0x7e along with 0x0a. This lets me create and manipulate data in a word processor.
The rendered text looks like this example: "Hippolyte Hemmer, Clément de Rome: Épître aux Corinthiens..." etc. |
12-13-2012, 03:21 PM | #4 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Check to be certain those entities aren't being xml escaped. If you entered (pasted, typed, whatever...) all the data from your Word Processor document into a WYSIWYG editor such as Sigil's Book View, that's likely to happen. Entities need to be pasted/typed into Code View (speaking strictly about Sigil here)... because they're, well... code. Otherwise É becomes &Eacute;. Just like <p> becomes <p>. What, if anything, are you using to build/create the ePub from your word processor document?
Last edited by DiapDealer; 12-13-2012 at 06:01 PM. |
12-14-2012, 06:37 AM | #5 |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
I posted a response yesterday, but do not see it here.
DiapDealer -- You were correct. My MakeEpub program was changing ampersands to %amp;. I made changes in the source code. The ebook now shows accents correctly. Many thanks. |
Advert | |
|
12-14-2012, 06:38 AM | #6 |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
Correction to type: Make that &
|
12-14-2012, 07:13 AM | #7 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
All's well that ends well.
|
12-19-2012, 04:36 PM | #8 |
Curmudgeon
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
Warning: You should *not* use HTML entities like é. EPUB is based on XHTML, not HTML, and XHTML does not define any entities other than &, <, >, '. and "—&, <, >, ', and ", respectively.
That means that other HTML entities are not technically legal in an EPUB file, and a reader would be within its rights to barf if it encounters them. You should always replace those entities with proper XML entities, e.g. & #233; or & #xe9; (without the space after the & in both cases, but I can't type them that way because this forum keeps translating them into é) instead of é. I originally tried to provide an incomplete list of some common substitutions in the form of Perl regular expressions, but the forum ate those, too. Here's the same list as text. prime -> #824 Prime -> #8243 ldquo -> #8220 rdquo -> #8221 lsquo -> #8216 rsquo -> #8217 mdash -> #8212. Suggest following this by character #8203 (zero-width space as a wrap hint). ndash ->#8211. Again, suggest adding a zero-width space afterwards. copy -> #169 trade -> #8482 deg -> #176 aacute -> #225 eacute -> #233 oacute -> #243 ntilde -> #241 iuml -> #239 ecirc -> #234 nbsp -> #160 For a full list, see http://www.fileformat.info/format/w3c/htmlentity.htm. Last edited by dgatwood; 12-19-2012 at 04:55 PM. |
12-20-2012, 12:56 PM | #9 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
See also http://en.wikipedia.org/wiki/List_of...cters_in_XHTML |
|
12-20-2012, 02:47 PM | #10 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
At w3.org, there is this list of entities:
http://www.w3.org/2000/07/8378/xhtml...s/entities.xml |
12-20-2012, 03:43 PM | #11 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It all depends on the parser and the XHTML DTD. I've never run into an ePub parser that couldn't handle them (assuming proper declarations), but I suppose it's possible. Perhaps someone is confusing xhtml1.1 and ePub2 with xhtml5 and ePub3? Named entites are no longer technically valid in that situation.
Last edited by DiapDealer; 12-20-2012 at 04:01 PM. |
12-20-2012, 09:36 PM | #12 | |
Curmudgeon
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
Quote:
|
|
04-15-2013, 09:31 PM | #13 |
Bemused by possibilities
Posts: 58
Karma: 480244
Join Date: Jul 2012
Device: iPad3, Kobo
|
Kobo requests that you use decimal entities and not character entities. I assume that it would be the same for other retailers.
List of entities http://www.derby.co.nz/web-development/entities.html |
04-16-2013, 12:32 AM | #14 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
|
04-16-2013, 01:19 AM | #15 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I haven't seen that request from Kobo and it sounds a bit silly though. It is much easier to type (and remember...) the named HTML entities than their number equivalent.
|
Tags |
accents, entities, html1.1, xml files |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Epub no support for some named entities? | Points | ePub | 25 | 11-19-2012 06:42 PM |
decimal entities in ePub instead of character entities | epub4ever | Calibre | 4 | 04-20-2012 02:27 AM |
Epub format, B & N PubIt!, and HTML character entities | jlandahl | ePub | 3 | 04-07-2011 04:38 AM |
Problem with accents converting PDF to EPUB | madeira | Calibre | 0 | 07-09-2010 05:15 PM |
Test for custom entities in ePUB | Jellby | ePub | 9 | 05-27-2009 06:45 AM |