![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
EPUB2 and the DOCTYPEgate
Hi
There is currently a difference betwen calibre's Editor and Sigil. The Editor gets systematically rid of the DOCTYPE and Kovid Goyal writes that the DOCTYPE is required only when there are named entities (like nbsp). I am personnally quite happy with this but I hear people arguing against it nearly for "moral" reasons. Illegal, non respect of such norm, and so on. I would like to find the reference file where this is an allowed use. The problem for me is that any document I find gives a reference to another and so on. I am sure this is old news, because some years ago already, writer2xhtml offered an option to produce an EPUB2 (EPUB3 was non-existent at that time) using only UTF8 or UTF16 characters - like the Editor today. (see screenshot) I speak about the common EPUB2 not about EPUB3, html 5 or anything else. I found in the xhtml 1.1 specs telling that the DOCTYPE was required (screenshot 2). But there is another one about xhtml 2 which says that it MAY only be required (screenshot 3) and is only necessary "when the character encoding of the document is other than the default UTF-8 and UTF-16" but it's a draft. Which one applies to EPUB 2? How old is the xhtml 2 specification? IDPF specifies for the EPUB 2.01 (May 2010), that the required MIME media type for application/xhtml + xml must be XHTML 1.1. That would mean that a DOCTYPE is required. http://www.idpf.org/epub/20/spec/OPS...m#Section1.3.4 Last edited by roger64; 02-23-2014 at 05:44 AM. |
![]() |
![]() |
![]() |
#2 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
As far as I could understand, the DOCTYPE is indeed not needed.
|
![]() |
![]() |
![]() |
#3 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
It is a bit technical, but for the XHTML spec the DOCTYPE is required. However, an ePUB must contain XML files, not per se XHTML. However, pure XML files will not work as far as I know...
|
![]() |
![]() |
![]() |
#5 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
It says it must be XML, with all elements (not in islands) from XHTML.
And then of course there's the specifications and there's the implementations ![]() |
![]() |
![]() |
![]() |
#6 |
A curiosus lector!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 463
Karma: 2015140
Join Date: Jun 2012
Device: Sony PRS-T1, Kobo Touch
|
Roger,
I did a quick test with a file created and saved in Sigil (I'm new with the Calibre eBook-Edit). Then the file was imported into the Calibre eBook-Edit module. I applied "Beautify all files", save the file and loaded the epub on my Kobo and Sony: both were displayed the files perfectly. So it seems, from a practical point of view at least for my Kobo and Sony, that both specs work as expected. And, by the way, the files passed epubcheck without a hitch. |
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
|
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks all for your inputs on this. The expression "some believe" in the wiki is a bit unclear...
![]() Now lets come back to the full original statement from Kovid Goyal Quote:
- wrong, because formally, EPUB 2.0.1 which is the latest iteration of the official norm before EPUB 3, still relies on xhtml 1.1 specs, for which the DOCTYPE is an absolute requirement. Point. - right, because technically, this DOCTYPE is only useful for named entities. So even if the DOCTYPE is required formally, it is not needed technically if all the code is made out of (UTF-16) Unicode characters. I agree with Arios: there seems to be absolutely no compatibility problems arising from this suppression of the DOCTYPE as long as the EPUB 2 contains only (UTF-16) Unicode characters. Last edited by roger64; 02-24-2014 at 02:33 AM. |
|
![]() |
![]() |
![]() |
#9 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
I correct it above. Last edited by roger64; 02-24-2014 at 02:38 AM. |
|
![]() |
![]() |
![]() |
#11 | ||
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Quote:
Named entity: é (needs DOCTYPE) Decimal entity: é Hexadecimal entity: é ("x" means it's hexadecimal, where "e" means fourteen, so "e9" = 15*16^1+9*16^0 = 233) Unicode character: é The Unicode character may be actually stored as utf-8 or utf-16, but that's mostly invisible to the user: In utf-8 it will be saved as: C3 A9 (two bytes) In utf-16 it will be saved as: 00 E9 (two bytes) Last edited by Jellby; 02-24-2014 at 03:44 AM. |
||
![]() |
![]() |
![]() |
#12 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
IMHO, there's no real benefit in using utf-16 for ePubs or anything else for that matter. |
|
![]() |
![]() |
![]() |
#13 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
Surely that overhead more or less disappears when the files are zipped into the epub file?
|
![]() |
![]() |
![]() |
#14 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
AFAIK, utf-16 only needs to be used with certain APIs that expect utf-16 formatted strings and for the proper handling of some CJK characters. |
|
![]() |
![]() |
![]() |
#15 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thank you for your information regarding characters and entities. It's a little like the ruins of Troja, you find still another layer when you start digging...
![]() Quote:
![]() Last edited by roger64; 02-24-2014 at 09:18 AM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Have Apple deprecated the display-options.xml file in ePub2? | Oxford-eBooks | ePub | 6 | 11-27-2013 08:30 AM |
How to solve EPUB3/EPUB2 rendering issue on Ipad | E-Books | ePub | 2 | 05-16-2013 07:07 AM |
Confused! XHTML, HTML, HTML5, EPUB2, EPUB3??? | carlosbcg | ePub | 29 | 02-23-2013 07:32 PM |
refined metadata in epub2? | mzmm | ePub | 2 | 11-14-2012 01:52 PM |
JAVASCRIPT support in ePub2/ePub3 | Raja1205 | ePub | 7 | 09-03-2012 06:48 AM |