07-11-2011, 04:04 AM | #1 |
Zealot
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
|
Best practices: Special characters
Is it true that simple characters (single and double quotes, en dash, em dash, ellipsis, etc) will display differently on different e-readers?
I ask this as when previewing my epub's html files in Firefox, these characters display as bad code (eg: em dash = –, single quote = ‘, sq close = ’), however they display fine in ADE and on the iPad using iBooks. For the sake of compatibility across the board, would it be best to convert all characters to their html code equivalent? Is this necessary process? What would be the best way to do it (I'm thinking a find-all/replace-all for each character would be one approach, but perhaps there is a better way). NB: I am using IDC5.5 to export to epub, but I am guessing this question would apply to others using a different program for epub creation - correct me if I am wrong and I will add an ID to the thread title. |
07-11-2011, 04:39 AM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
It should not be necessary. If you define everything to UTF-8 it usually goes well. Of course the character has to be in the font of the reader app...
|
Advert | |
|
07-11-2011, 04:42 AM | #3 |
Groupie
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
Nope, all those characters are pretty safe. What you're seeing is mojibake. You're using UTF-8, but the browser is decoding it as Latin-1 (ish). This is entirely plausible with epub. Your... content.opf file is serving the HTML files as
application/xhtml+xml; charset=utf-8 but obviously you're not asking your browser to read the OPF file, only the HTML file. It's possible your browser is defaulting to Latin-1 (ish). In which case, get a better browser to test with. Firefox will auto-detect compliant UTF-8. The other obvious possibility is that your HTML files are lying. They may contain a <meta> tag which declares it as Latin-1 or similar. (ISO- and a numeric code). Anything that expects XML will ignore that, but browsers which expect HTML will obey it. Finally, a technical note. XHTML and HTML are actually different syntaxes. In HTML4 and below, they're technically incompatible, but browser-HTML is compatible. In HTML5, compatibility is possible. In both cases, complying with both HTML and XHTML imposes some extra restrictions. (See "polyglot markup" for the current draft recommendations). E.g. you're supposed to stick to UTF-8, because that's the default for XML, and the declaration to specify a different encoding is not HTML-compatible. So no going insane and switching to obsolete encodings like UTF-16 :-). If you want to make life easier for yourself, you'd be better off at least using the EPUBReader extension for firefox. Then you can open the EPUB, firefox will read your OPF file, and it should just work without having to change anything. Second note: all the characters you mentioned will _display_ correctly, but there's a caveat with em dashes. Most dedicated e-readers are too dumb to break lines at em dashes - so you get very long words, which intefere with justification (assuming you use justification). Some people prefer to avoid them, and use en dashes with spaces instead. Third note: Apparently IDC5.5 is much better than previous editions, but people still end up having to look carefully at & tweak the generated XML. So you may well end up having to fix their code (although I would be surprised if they've managed to screw up basic character encoding for no good reason). |
07-11-2011, 04:51 AM | #4 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Check that the character encoding in Firefox is set to Auto-Detect - Universal, or alternatively UTF-8. If it's set to anything else then it won't render the characters properly. ePub readers should all handle UTF-8 so I wouldn't worry about it.
|
07-11-2011, 10:17 PM | #5 |
Zealot
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
|
That's a relief, thanks all for your help!
Sourcejedi, I've actually been wondering why my epubs are made up of html files instead of xhtml files. I am exporting from ID, unzipping using Stuffit Expander, and each file has the html extension by default. If I look at the source, each file starts with: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 //EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> Does that ensure I'm working in xhtml, or should my files also be using the xhtml extension? |
Advert | |
|
07-12-2011, 01:31 AM | #6 |
Media Bloke
Posts: 2,381
Karma: 113956855
Join Date: Sep 2010
Location: NSW - Australia
Device: iOS
|
I had this hassle exporting from IDCS4. All manner of helpful hints from the forum wouldn't solve it. I used notepad++ to locate ALL files in the directory and find and replace. Since using CS5 I haven't had the problem. If you actually track the cause please post it here.
|
07-12-2011, 02:20 AM | #7 |
Zealot
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
|
So all files should have the xhtml extension?
|
07-12-2011, 02:58 AM | #8 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
No, the extension doesn't matter. The type of file is set by the doctype definition, which is correctly specifying xhtml version 1.1 with utf-8 encoding. Everything's fine.
|
07-12-2011, 03:29 AM | #9 |
Groupie
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
Actually, what makes them XHTML is the OPF file. Open it up and search for xhtml :-), you'll see what I mean. But yes, there's nothing to worry about. And if you _are_ getting that wrong, it should show up when you run epubcheck, because epub doesn't allow normal html.
I was just trying to figure out why Firefox didn't get the right character encoding. As charleski pointed out, it might have been a configuration issue. But it was worth pointing out that your test with Firefox was out-of-spec. Unless you're specifically trying to produce "polyglot" markup that works as both syntaxes, for some peculiar reason. Right now, you're using markup which only works in XHTML (again, this is fine for epub) - <?xml version="1.0" encoding="UTF-8" standalone="no"?> (but for this specific issue, firefox _should_ autodetect the character encoding anyway, unless there's another problem, or firefox is misconfigured). |
Tags |
epub, special chacracters |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Special Characters | abbotrichard | ePub | 4 | 07-01-2011 06:03 PM |
Content Special Characters in Collections | bear4hunter | Amazon Kindle | 2 | 08-06-2010 07:11 PM |
REFERENCE: Special Characters | nrapallo | IMP | 2 | 04-07-2008 01:29 PM |
Special Characters / Fonts | Gatton | IMP | 4 | 03-21-2008 01:43 AM |
Special Characters in Plucker | Eroica | Reading and Management | 4 | 11-15-2007 11:22 AM |