Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2014, 03:35 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
EPUB2 and the DOCTYPEgate

Hi

There is currently a difference betwen calibre's Editor and Sigil. The Editor gets systematically rid of the DOCTYPE and Kovid Goyal writes that the DOCTYPE is required only when there are named entities (like nbsp).

I am personnally quite happy with this but I hear people arguing against it nearly for "moral" reasons. Illegal, non respect of such norm, and so on.

I would like to find the reference file where this is an allowed use. The problem for me is that any document I find gives a reference to another and so on.

I am sure this is old news, because some years ago already, writer2xhtml offered an option to produce an EPUB2 (EPUB3 was non-existent at that time) using only UTF8 or UTF16 characters - like the Editor today. (see screenshot)

I speak about the common EPUB2 not about EPUB3, html 5 or anything else. I found in the xhtml 1.1 specs telling that the DOCTYPE was required (screenshot 2). But there is another one about xhtml 2 which says that it MAY only be required (screenshot 3) and is only necessary "when the character encoding of the document is other than the default UTF-8 and UTF-16" but it's a draft.

Which one applies to EPUB 2? How old is the xhtml 2 specification?

IDPF specifies for the EPUB 2.01 (May 2010), that the required MIME media type for application/xhtml + xml must be XHTML 1.1. That would mean that a DOCTYPE is required.
http://www.idpf.org/epub/20/spec/OPS...m#Section1.3.4
Attached Thumbnails
Click image for larger version

Name:	UTF16.png
Views:	339
Size:	54.7 KB
ID:	119423   Click image for larger version

Name:	xhtml 1.1.png
Views:	295
Size:	17.0 KB
ID:	119424   Click image for larger version

Name:	xhtml 2.0.png
Views:	306
Size:	58.9 KB
ID:	119425  

Last edited by roger64; 02-23-2014 at 05:44 AM.
roger64 is offline   Reply With Quote
Old 02-23-2014, 05:06 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
As far as I could understand, the DOCTYPE is indeed not needed.
Jellby is offline   Reply With Quote
Old 02-23-2014, 05:25 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Jellby View Post
As far as I could understand, the DOCTYPE is indeed not needed.
But isn't the ePub 2.01 standard based on the second edition of the older xhtml 1.1. standard, which does require a doctype?
Doitsu is offline   Reply With Quote
Old 02-23-2014, 05:37 AM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
It is a bit technical, but for the XHTML spec the DOCTYPE is required. However, an ePUB must contain XML files, not per se XHTML. However, pure XML files will not work as far as I know...
Toxaris is offline   Reply With Quote
Old 02-23-2014, 06:17 AM   #5
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
It says it must be XML, with all elements (not in islands) from XHTML.

And then of course there's the specifications and there's the implementations
Jellby is offline   Reply With Quote
Old 02-23-2014, 03:14 PM   #6
Arios
A curiosus lector!
Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.
 
Arios's Avatar
 
Posts: 463
Karma: 2015140
Join Date: Jun 2012
Device: Sony PRS-T1, Kobo Touch
Roger,

I did a quick test with a file created and saved in Sigil (I'm new with the Calibre eBook-Edit).

Then the file was imported into the Calibre eBook-Edit module. I applied "Beautify all files", save the file and loaded the epub on my Kobo and Sony: both were displayed the files perfectly.

So it seems, from a practical point of view at least for my Kobo and Sony, that both specs work as expected.

And, by the way, the files passed epubcheck without a hitch.
Arios is offline   Reply With Quote
Old 02-23-2014, 08:12 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Take a look at our wiki https://wiki.mobileread.com/wiki/DOCTYPE

Dale
DaleDe is offline   Reply With Quote
Old 02-23-2014, 11:14 PM   #8
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks all for your inputs on this. The expression "some believe" in the wiki is a bit unclear...

Now lets come back to the full original statement from Kovid Goyal

Quote:
Originally Posted by kovidgoyal View Post
Unicode characters work in every single place that entities work. Both unicode characters and entities require a declaration in the header. The DOCTYPE in the case of named entities and the character encoding in the case of unicode characters.

The DOCTYPE is an absolute requirement in order to use named entities in XHTML, while the character encoding is not, since the default encoding for XHTML is UTF-8 when undeclared, which is the encoding the calibre editor uses.

Therefore, named entities are actually *less* likely to work than unicode characters.
From what I can see, the second paragraph of this statement is ambiguous because it's both right and wrong.

- wrong, because formally, EPUB 2.0.1 which is the latest iteration of the official norm before EPUB 3, still relies on xhtml 1.1 specs, for which the DOCTYPE is an absolute requirement. Point.

- right, because technically, this DOCTYPE is only useful for named entities. So even if the DOCTYPE is required formally, it is not needed technically if all the code is made out of (UTF-16) Unicode characters.

I agree with Arios: there seems to be absolutely no compatibility problems arising from this suppression of the DOCTYPE as long as the EPUB 2 contains only (UTF-16) Unicode characters.

Last edited by roger64; 02-24-2014 at 02:33 AM.
roger64 is offline   Reply With Quote
Old 02-24-2014, 02:03 AM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by roger64 View Post
So even if the DOCTYPE is required formally, it is not needed technically if all the code is made out of UTF-16 characters.
I agree with Arios: there seems to be absolutely no compatibility problems arising from this suppression of the DOCTYPE as long as the EPUB 2 contains only UTF-16 characters.
According to the ePub 2.01 specs, files can be encoded either as utf-8 or utf-16 files. I.e., there's no need to use utf-16 files just because you want to avoid including a DOCTYPE. Also, by definition, utf-16 files are at least twice the size of utf-8 files and offer no advantages for languages that use the Latin alphabet.
Doitsu is offline   Reply With Quote
Old 02-24-2014, 02:31 AM   #10
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Doitsu View Post
According to the ePub 2.01 specs, files can be encoded either as utf-8 or utf-16 files. I.e., there's no need to use utf-16 files just because you want to avoid including a DOCTYPE. Also, by definition, utf-16 files are at least twice the size of utf-8 files and offer no advantages for languages that use the Latin alphabet.
Thank you for this interesting info. I just spoke about UTF-16 because I had read (where?) that calibre editor used hexadecimal unicode characters. I knew no more about this. It would have been better to write only: Unicode characters.

I correct it above.

Last edited by roger64; 02-24-2014 at 02:38 AM.
roger64 is offline   Reply With Quote
Old 02-24-2014, 03:37 AM   #11
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by roger64 View Post
- wrong, because formally, EPUB 2.0.1 which is the latest iteration of the official norm before EPUB 3, still relies on xhtml 1.1 specs, for which the DOCTYPE is an absolute requirement. Point.
I don't see that. Where is it stated that ePub documents must be valid XHTML 1.1? As I said above, they must be vaild XML, with XHTML.

Quote:
Originally Posted by roger64 View Post
I just spoke about UTF-16 because I had read (where?) that calibre editor used hexadecimal unicode characters. I knew no more about this. It would have been better to write only: Unicode characters.
You are mixing several things:

Named entity: é (needs DOCTYPE)
Decimal entity: é
Hexadecimal entity: é ("x" means it's hexadecimal, where "e" means fourteen, so "e9" = 15*16^1+9*16^0 = 233)
Unicode character: é

The Unicode character may be actually stored as utf-8 or utf-16, but that's mostly invisible to the user:
In utf-8 it will be saved as: C3 A9 (two bytes)
In utf-16 it will be saved as: 00 E9 (two bytes)

Last edited by Jellby; 02-24-2014 at 03:44 AM.
Jellby is offline   Reply With Quote
Old 02-24-2014, 04:08 AM   #12
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Jellby View Post
The Unicode character may be actually stored as utf-8 or utf-16, but that's mostly invisible to the user:
In utf-8 it will be saved as: C3 A9 (two bytes)
In utf-16 it will be saved as: 00 E9 (two bytes)
However, since in utf-16, all characters need be encoded with at least two bytes, utf-16 files can be up to twice the size as utf-8 files (if they only contain characters from the Basic Latin block).

IMHO, there's no real benefit in using utf-16 for ePubs or anything else for that matter.
Doitsu is offline   Reply With Quote
Old 02-24-2014, 06:41 AM   #13
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
Quote:
Originally Posted by Doitsu View Post
However, since in utf-16, all characters need be encoded with at least two bytes, utf-16 files can be up to twice the size as utf-8 files (if they only contain characters from the Basic Latin block).
Surely that overhead more or less disappears when the files are zipped into the epub file?
SBT is offline   Reply With Quote
Old 02-24-2014, 07:52 AM   #14
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,724
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by SBT View Post
Surely that overhead more or less disappears when the files are zipped into the epub file?
It does indeed more or less disappear when the files are zipped, but since the reading software has to unzip the .html files in order to render the text and some older readers have problems with .html files significantly larger than 280 KB, IMHO, there's no point in using utf-16 in the first place.

AFAIK, utf-16 only needs to be used with certain APIs that expect utf-16 formatted strings and for the proper handling of some CJK characters.
Doitsu is offline   Reply With Quote
Old 02-24-2014, 09:14 AM   #15
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,624
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thank you for your information regarding characters and entities. It's a little like the ruins of Troja, you find still another layer when you start digging...

Quote:
Originally Posted by Jellby View Post
I don't see that. Where is it stated that ePub documents must be valid XHTML 1.1? As I said above, they must be vaild XML, with XHTML.
This statement makes me uneasy. Look how the Editor behaves with valid XHTML 1.1. I produce EPUB2 from a converter with valid xhtml 1.1. files. As soon as I open this EPUB2 with the Editor, the DOCTYPEs are beheaded (and named or numbered entities transformed). It's not XML but pure XHTML though...

Last edited by roger64; 02-24-2014 at 09:18 AM.
roger64 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Have Apple deprecated the display-options.xml file in ePub2? Oxford-eBooks ePub 6 11-27-2013 08:30 AM
How to solve EPUB3/EPUB2 rendering issue on Ipad E-Books ePub 2 05-16-2013 07:07 AM
Confused! XHTML, HTML, HTML5, EPUB2, EPUB3??? carlosbcg ePub 29 02-23-2013 07:32 PM
refined metadata in epub2? mzmm ePub 2 11-14-2012 01:52 PM
JAVASCRIPT support in ePub2/ePub3 Raja1205 ePub 7 09-03-2012 06:48 AM


All times are GMT -4. The time now is 02:22 PM.


MobileRead.com is a privately owned, operated and funded community.