Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 07-11-2011, 04:04 AM   #1
virtual_ink
Zealot
virtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheese
 
virtual_ink's Avatar
 
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
Best practices: Special characters

Is it true that simple characters (single and double quotes, en dash, em dash, ellipsis, etc) will display differently on different e-readers?

I ask this as when previewing my epub's html files in Firefox, these characters display as bad code (eg: em dash = –, single quote = ‘, sq close = ’), however they display fine in ADE and on the iPad using iBooks.

For the sake of compatibility across the board, would it be best to convert all characters to their html code equivalent? Is this necessary process? What would be the best way to do it (I'm thinking a find-all/replace-all for each character would be one approach, but perhaps there is a better way).

NB: I am using IDC5.5 to export to epub, but I am guessing this question would apply to others using a different program for epub creation - correct me if I am wrong and I will add an ID to the thread title.
virtual_ink is offline   Reply With Quote
Old 07-11-2011, 04:39 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
It should not be necessary. If you define everything to UTF-8 it usually goes well. Of course the character has to be in the font of the reader app...
Toxaris is offline   Reply With Quote
Advert
Old 07-11-2011, 04:42 AM   #3
sourcejedi
Groupie
sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.
 
sourcejedi's Avatar
 
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
Nope, all those characters are pretty safe. What you're seeing is mojibake. You're using UTF-8, but the browser is decoding it as Latin-1 (ish). This is entirely plausible with epub. Your... content.opf file is serving the HTML files as

application/xhtml+xml; charset=utf-8

but obviously you're not asking your browser to read the OPF file, only the HTML file.

It's possible your browser is defaulting to Latin-1 (ish). In which case, get a better browser to test with. Firefox will auto-detect compliant UTF-8.

The other obvious possibility is that your HTML files are lying. They may contain a <meta> tag which declares it as Latin-1 or similar. (ISO- and a numeric code). Anything that expects XML will ignore that, but browsers which expect HTML will obey it.

Finally, a technical note.

XHTML and HTML are actually different syntaxes. In HTML4 and below, they're technically incompatible, but browser-HTML is compatible. In HTML5, compatibility is possible. In both cases, complying with both HTML and XHTML imposes some extra restrictions. (See "polyglot markup" for the current draft recommendations).

E.g. you're supposed to stick to UTF-8, because that's the default for XML, and the declaration to specify a different encoding is not HTML-compatible. So no going insane and switching to obsolete encodings like UTF-16 :-).

If you want to make life easier for yourself, you'd be better off at least using the EPUBReader extension for firefox. Then you can open the EPUB, firefox will read your OPF file, and it should just work without having to change anything.

Second note: all the characters you mentioned will _display_ correctly, but there's a caveat with em dashes. Most dedicated e-readers are too dumb to break lines at em dashes - so you get very long words, which intefere with justification (assuming you use justification). Some people prefer to avoid them, and use en dashes with spaces instead.

Third note: Apparently IDC5.5 is much better than previous editions, but people still end up having to look carefully at & tweak the generated XML. So you may well end up having to fix their code (although I would be surprised if they've managed to screw up basic character encoding for no good reason).
sourcejedi is offline   Reply With Quote
Old 07-11-2011, 04:51 AM   #4
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Check that the character encoding in Firefox is set to Auto-Detect - Universal, or alternatively UTF-8. If it's set to anything else then it won't render the characters properly. ePub readers should all handle UTF-8 so I wouldn't worry about it.
charleski is offline   Reply With Quote
Old 07-11-2011, 10:17 PM   #5
virtual_ink
Zealot
virtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheese
 
virtual_ink's Avatar
 
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
That's a relief, thanks all for your help!

Sourcejedi, I've actually been wondering why my epubs are made up of html files instead of xhtml files. I am exporting from ID, unzipping using Stuffit Expander, and each file has the html extension by default.

If I look at the source, each file starts with:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 //EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

Does that ensure I'm working in xhtml, or should my files also be using the xhtml extension?
virtual_ink is offline   Reply With Quote
Advert
Old 07-12-2011, 01:31 AM   #6
wannabee
Media Bloke
wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.wannabee ought to be getting tired of karma fortunes by now.
 
Posts: 2,381
Karma: 113956855
Join Date: Sep 2010
Location: NSW - Australia
Device: iOS
I had this hassle exporting from IDCS4. All manner of helpful hints from the forum wouldn't solve it. I used notepad++ to locate ALL files in the directory and find and replace. Since using CS5 I haven't had the problem. If you actually track the cause please post it here.
wannabee is offline   Reply With Quote
Old 07-12-2011, 02:20 AM   #7
virtual_ink
Zealot
virtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheesevirtual_ink can extract oil from cheese
 
virtual_ink's Avatar
 
Posts: 107
Karma: 1000
Join Date: Sep 2010
Location: Melbourne, Australia
Device: iPad2, Kindle
So all files should have the xhtml extension?
virtual_ink is offline   Reply With Quote
Old 07-12-2011, 02:58 AM   #8
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
No, the extension doesn't matter. The type of file is set by the doctype definition, which is correctly specifying xhtml version 1.1 with utf-8 encoding. Everything's fine.
charleski is offline   Reply With Quote
Old 07-12-2011, 03:29 AM   #9
sourcejedi
Groupie
sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.sourcejedi ought to be getting tired of karma fortunes by now.
 
sourcejedi's Avatar
 
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
Actually, what makes them XHTML is the OPF file. Open it up and search for xhtml :-), you'll see what I mean. But yes, there's nothing to worry about. And if you _are_ getting that wrong, it should show up when you run epubcheck, because epub doesn't allow normal html.

I was just trying to figure out why Firefox didn't get the right character encoding. As charleski pointed out, it might have been a configuration issue. But it was worth pointing out that your test with Firefox was out-of-spec.

Unless you're specifically trying to produce "polyglot" markup that works as both syntaxes, for some peculiar reason. Right now, you're using markup which only works in XHTML (again, this is fine for epub) -

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

(but for this specific issue, firefox _should_ autodetect the character encoding anyway, unless there's another problem, or firefox is misconfigured).
sourcejedi is offline   Reply With Quote
Reply

Tags
epub, special chacracters


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Special Characters abbotrichard ePub 4 07-01-2011 06:03 PM
Content Special Characters in Collections bear4hunter Amazon Kindle 2 08-06-2010 07:11 PM
REFERENCE: Special Characters nrapallo IMP 2 04-07-2008 01:29 PM
Special Characters / Fonts Gatton IMP 4 03-21-2008 01:43 AM
Special Characters in Plucker Eroica Reading and Management 4 11-15-2007 11:22 AM


All times are GMT -4. The time now is 08:15 AM.


MobileRead.com is a privately owned, operated and funded community.