![]() |
#1 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
accents and entities in an epub
Currently I am finishing an index and ebook of a scholarly work -- seven languages, 467 footnotes, etc. The only remaining item prior to customer acceptance is to get the accented characters in French and German showing correctly.
My standard opening for XML files is: <?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> This standard opening leaves all of the É and ô, etc. showing in the text. I have tried following up the DOCTPE lines above with <!ENTITY HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> Flight Crew and EPubCheck are very unhappy with this. Any suggestions? |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
If your file is encoded in UTF (so not only the declaration), it should not be a problem. Are you saying that the HTML entities remain in the rendered text or in the code? The second is no issue, the first one quite peculiar.
Also don't touch the DOCTYPE. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
The file is entirely printable ASCII characters 0x20 through 0x7e along with 0x0a. This lets me create and manipulate data in a word processor.
The rendered text looks like this example: "Hippolyte Hemmer, Clément de Rome: Épître aux Corinthiens..." etc. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,548
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Check to be certain those entities aren't being xml escaped. If you entered (pasted, typed, whatever...) all the data from your Word Processor document into a WYSIWYG editor such as Sigil's Book View, that's likely to happen. Entities need to be pasted/typed into Code View (speaking strictly about Sigil here)... because they're, well... code.
![]() Last edited by DiapDealer; 12-13-2012 at 06:01 PM. |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
I posted a response yesterday, but do not see it here.
DiapDealer -- You were correct. My MakeEpub program was changing ampersands to %amp;. I made changes in the source code. The ebook now shows accents correctly. Many thanks. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Mar 2012
Device: Nook
|
Correction to type: Make that &
|
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,548
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
All's well that ends well.
![]() |
![]() |
![]() |
![]() |
#8 |
Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
Warning: You should *not* use HTML entities like é. EPUB is based on XHTML, not HTML, and XHTML does not define any entities other than &, <, >, '. and "—&, <, >, ', and ", respectively.
That means that other HTML entities are not technically legal in an EPUB file, and a reader would be within its rights to barf if it encounters them. You should always replace those entities with proper XML entities, e.g. & #233; or & #xe9; (without the space after the & in both cases, but I can't type them that way because this forum keeps translating them into é) instead of é. I originally tried to provide an incomplete list of some common substitutions in the form of Perl regular expressions, but the forum ate those, too. Here's the same list as text. prime -> #824 Prime -> #8243 ldquo -> #8220 rdquo -> #8221 lsquo -> #8216 rsquo -> #8217 mdash -> #8212. Suggest following this by character #8203 (zero-width space as a wrap hint). ndash ->#8211. Again, suggest adding a zero-width space afterwards. copy -> #169 trade -> #8482 deg -> #176 aacute -> #225 eacute -> #233 oacute -> #243 ntilde -> #241 iuml -> #239 ecirc -> #234 nbsp -> #160 For a full list, see http://www.fileformat.info/format/w3c/htmlentity.htm. Last edited by dgatwood; 12-19-2012 at 04:55 PM. |
![]() |
![]() |
![]() |
#9 | |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
See also http://en.wikipedia.org/wiki/List_of...cters_in_XHTML |
|
![]() |
![]() |
![]() |
#10 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
At w3.org, there is this list of entities:
http://www.w3.org/2000/07/8378/xhtml...s/entities.xml |
![]() |
![]() |
![]() |
#11 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,548
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It all depends on the parser and the XHTML DTD. I've never run into an ePub parser that couldn't handle them (assuming proper declarations), but I suppose it's possible. Perhaps someone is confusing xhtml1.1 and ePub2 with xhtml5 and ePub3? Named entites are no longer technically valid in that situation.
Last edited by DiapDealer; 12-20-2012 at 04:01 PM. |
![]() |
![]() |
![]() |
#12 | |
Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#13 |
Bemused by possibilities
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 58
Karma: 480244
Join Date: Jul 2012
Device: iPad3, Kobo
|
Kobo requests that you use decimal entities and not character entities. I assume that it would be the same for other retailers.
List of entities http://www.derby.co.nz/web-development/entities.html |
![]() |
![]() |
![]() |
#14 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
|
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I haven't seen that request from Kobo and it sounds a bit silly though. It is much easier to type (and remember...) the named HTML entities than their number equivalent.
|
![]() |
![]() |
![]() |
Tags |
accents, entities, html1.1, xml files |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Epub no support for some named entities? | Points | ePub | 25 | 11-19-2012 06:42 PM |
decimal entities in ePub instead of character entities | epub4ever | Calibre | 4 | 04-20-2012 02:27 AM |
Epub format, B & N PubIt!, and HTML character entities | jlandahl | ePub | 3 | 04-07-2011 04:38 AM |
Problem with accents converting PDF to EPUB | madeira | Calibre | 0 | 07-09-2010 05:15 PM |
Test for custom entities in ePUB | Jellby | ePub | 9 | 05-27-2009 06:45 AM |