Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-19-2013, 08:58 PM   #1
carlosbcg
Member
carlosbcg began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Feb 2013
Device: Linux laptop and iPhone
Confused! XHTML, HTML, HTML5, EPUB2, EPUB3???

Hi everyone,

This is my first post. I normally try and figure things out myself as that is faster than trying to get help somewhere but after spending countless hours reading and trying to figure this out I thought it wouldn't hurt for me to pick some brains on this .

Here is what I think I know so far...(I am working on my first book and wanting to create it as an EPUB3).

EPUB2 uses CSS2 and HTML inside. But, and here is where it gets a bit confusing right off the bat, the HTML must be in XHTML format but having an extension of...well...HTML?

EPUB3 uses CSS3 and HTML5 but it too must be in XHTML format?? If so what brand of XHTML? And the extension of the HTML5 (or is it XHTML files) must be...hmm...HTML again?

Can one create an EPUB3 file with plain ol HTML5 and CSS3 without the XHTML business?

What is the difference between XHTML and HTML5?

Is HTML5 just the latest and greatest HTML while XHTML is...I dunno

Among the hats I have worn in the past is that of a web developer so I am very familiar with CSS and HTML (even PHP) but hardly at all with XHTML and HTML5.

Any insight on any of this gobbly gook anyone might care to share with me would be most appreciated.

Thanks!

Carlos
carlosbcg is offline   Reply With Quote
Old 02-19-2013, 09:24 PM   #2
carlosbcg
Member
carlosbcg began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Feb 2013
Device: Linux laptop and iPhone
Here is more confusing goodness

I just read this at ibm developer works...

"if you're migrating from an EPUB 2 to EPUB 3 workflow, consider starting by converting from existing NCX documents. Because both input and output documents are XML, this is a perfect application for XSLT."

What in the world is XML and XSLT??

So now we have HTML, HTML5, XHTML, EPUB2, EPUB3, XML, and XSLT!

Sigh.

Carlos
carlosbcg is offline   Reply With Quote
Advert
Old 02-20-2013, 12:35 AM   #3
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Welcome to the Forum!

It can be confusing at first. But I'll give a very basic overview and then point you to where you can get some good info.

XHTML is just HTML with a stricter adherence to some rules. For example - you are required to have proper closing tags on ALL of your elements, and attribute names need to be lower case. As long as you follow the rules you can name it .html and no one is the wiser. The .xhtml is/was just a way to tell the difference for software programs that cared.

CSS2 is the list of attributes that you can use to describe html elements. CSS3 is the new list with extended capabilities/attributes. Depending on the particular reader App/Device they can support any, none, or all of the CSS3 list. If you want the widest possible support of your epub, you should use ePub2 and stick to the CSS2 list.

HTML5 is just the next generation of HTML. It was 4.01, but they added some new capabilities/tags.

ePub2 is what almost all ebooks are written in now - because the reader Apps/Devices don't yet support the advanced functionality in the new version ePub3. This is changing - slowly. Kindle, iBooks, Kobo, are among the few who are STARTING to support the functionality - but they are using their own hybrid...so that's a mess.

The ncx file is what ePub 2 uses to navigate through the documents - think Table of Contents. ePub 3 uses more of an HTML based document for it's navigation.

XML - don't know...I avoid it!

XSLT - don't know about that either, but from the context I would gather it is an interpretation program from one format to another??

OK...pretty quick and dirty.

The W3Schools website has some great tutorials that can explain all of this much better, and they have reference pages as well to show you what all the different tags mean and how they are used.

http://www.w3schools.com/html/default.asp

I hope that helps!

Cheers,
Turtle91 is offline   Reply With Quote
Old 02-20-2013, 12:36 AM   #4
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
From the top:

SGML stands for Standard Generalized Markup Language. There are many SGMLs. Crudely put, an SGML is any markup language that is characterized by any arbitrary set of tags surrounded by angle braces, with certain bits at the beginning to tell you what type of file it is, and there are probably a few other rules.

XML is a strict subset of SGMLs. XML is a strict subset of SGMLs in which, among other things, all tags must be matched with a close tag, and a few other details. There are many dialects of XML (an XML dialect is basically just a specific set of allowed tags that can be nested in specific ways), including DocBook, XHTML, property lists, and so on.

HTML is an example of an SGML. HTML has a specific set of tags that are considered valid. HTML is not, however, based on XML, because some tags do not have to be closed at all (hr, script when a URL is provided, and so on), and some tags auto-close at the right time (p, li, etc.). [Edit: And, as Turtle91 pointed out, HTML specifies case-insensitive tag and attribute parsing, whereas XML specifies case-sensitive tag and attribute parsing, which, in the case of XHTML's built-in tags and attributes, translates to "all lowercase".]

HTML5 is a specific version of HTML. Like all HTMLs, it is an SGML, but HTML5 files are not (necessarily) proper XML.

XHTML is a special form of HTML that has been modified slightly so that every XHTML file is a proper XML file that conforms to the stricter XML standards. This requires a few tiny tweaks around the fringes, but it mostly looks like HTML with some extra close tags or self-closing tags.

XSLT is another XML dialect. An XSLT stylesheet provides a set of rules for transforming from one XML dialect to another (typically, though in practice, it can be used to translate a specified XML dialect into pretty much anything, up to and including LaTeX commands).

EPUB2 and EPUB3 are versions of EPUB. EPUB2 uses XHTML under the hood. EPUB3 uses HTML5, but it must be parseable as XML. So it must be a polyglot XML/HTML5 document. This polyglot is called XHTML5, but is defined as part of the HTML5 standard rather than in a separate standard as previous XHTML versions were.

Clear as mud?

Last edited by dgatwood; 02-20-2013 at 12:39 AM.
dgatwood is offline   Reply With Quote
Old 02-20-2013, 02:44 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
That sums it about up. You can consider the XSLT as the CSS for XML files.

Be aware that there are some more differences with regards to XHTML and HTML, but not earth-shaking. The extension is not important, it is the first line in the document that tells the renderers (e.g. browsers) how to interpret the document.
An ePUB file must be XHTML. I would advise that if you don't need javascript/audio/video and stuff like that, to create an ePUB2 instead of ePUB3.
Toxaris is offline   Reply With Quote
Advert
Old 02-20-2013, 06:02 AM   #6
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
To add to your misery, NO device fully supports any of the above.

So start with something simple, even a few lines. View it a reader for whatever device you are aiming at and go from there. Epubs containing mostly text are going to have the least problems, so that might be a place to start. Sigil is great program that handles many of the details for you, if you have an html file to feed it, or you can type in directly or paste in text. Many start with Word, export as filtered html, or use a macro by Toxaris available here, then open in Sigil to finish off things.

There is a library here which has thousands of books. You can open the files that make up the epubs with calibre or sigil, or unzip them and display them in any text editor you like. Just make sure you do not re-zip them yourself without knowing a few further arcane rules.
mrmikel is offline   Reply With Quote
Old 02-20-2013, 06:26 AM   #7
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Regarding CSS, neither ePub 2 or ePub 3 supports neither CSS2 or CSS3. They both support some subset of them with some additional properties.

But "support" here simply means that compliant readers are required to know some properties and values (not always to do anything useful with them). Unfortunately, no reader actually supports what they are required too. And then any reader is allowed to support additional properties.

Since CSS is designed to ignore unknown properties, this means you should be able to use whatever you want in the CSS (as long as it's syntactically correct), but can never be sure of the effect it will have on a reader without trying. I mean, the resulting ePub will be valid, but may not work exactly as intended.
Jellby is offline   Reply With Quote
Old 02-20-2013, 07:22 AM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
And for the record, I don't think the extension of the (x)html(5) files really matters at all--as long as the content is compliant and they're manifested properly.
DiapDealer is offline   Reply With Quote
Old 02-20-2013, 01:40 PM   #9
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
You can also read about all of these things in our wiki. And as DiapDealer said, don't be fooled by a file extension. The content is what matters not the extension particularly for all the HTML, XML variations.

Dale
DaleDe is offline   Reply With Quote
Old 02-20-2013, 02:30 PM   #10
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I am not quite sure he is less confused now...
Toxaris is offline   Reply With Quote
Old 02-20-2013, 02:41 PM   #11
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by Toxaris View Post
I am not quite sure he is less confused now...
HE might not be less confused...but I certainly am. I know for a FACT that I'm crazy to get involved with this stuff!
Turtle91 is offline   Reply With Quote
Old 02-20-2013, 04:26 PM   #12
twobits
Addict
twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.
 
Posts: 223
Karma: 1057019
Join Date: Oct 2010
Device: none
Quote:
Originally Posted by dgatwood View Post
From the top:

SGML stands for Standard Generalized Markup Language. There are many SGMLs. Crudely put, an SGML is any markup language that is characterized by any arbitrary set of tags surrounded by angle braces, with certain bits at the beginning to tell you what type of file it is, and there are probably a few other rules.
There is only one SGML actually. It is an ISO standard now and descended from GML. Overall though this was a pretty good summary, except you missed one key piece.

DTD, or Document Type Declaration. This defines what tags and rules for them make up a valid document for that document type.

Quote:
XML is a strict subset of SGMLs. XML is a strict subset of SGMLs in which, among other things, all tags must be matched with a close tag, and a few other details. There are many dialects of XML (an XML dialect is basically just a specific set of allowed tags that can be nested in specific ways), including DocBook, XHTML, property lists, and so on.
It is not a dialect of XML but a DTD for XML.

Quote:
HTML is an example of an SGML. HTML has a specific set of tags that are considered valid. HTML is not, however, based on XML, because some tags do not have to be closed at all (hr, script when a URL is provided, and so on), and some tags auto-close at the right time (p, li, etc.). [Edit: And, as Turtle91 pointed out, HTML specifies case-insensitive tag and attribute parsing, whereas XML specifies case-sensitive tag and attribute parsing, which, in the case of XHTML's built-in tags and attributes, translates to "all lowercase".]
At first html was only modeled on sgml, but was more adhoc then sgml allowed. It was not until later (4.0 or 3.2 can't recall which off hand) that it was given a formal dtd that made it true sgml.

Quote:
HTML5 is a specific version of HTML. Like all HTMLs, it is an SGML, but HTML5 files are not (necessarily) proper XML.

XHTML is a special form of HTML that has been modified slightly so that every XHTML file is a proper XML file that conforms to the stricter XML standards. This requires a few tiny tweaks around the fringes, but it mostly looks like HTML with some extra close tags or self-closing tags.
Right about XHTML, but it is probably worth noting that XHTML is simple a DTD for XML.

Quote:
XSLT is another XML dialect. An XSLT stylesheet provides a set of rules for transforming from one XML dialect to another (typically, though in practice, it can be used to translate a specified XML dialect into pretty much anything, up to and including LaTeX commands).
Actually XSLT is a Turing complete language. To use it you usually also need to learn XQuery and XPath.

Quote:
EPUB2 and EPUB3 are versions of EPUB. EPUB2 uses XHTML under the hood. EPUB3 uses HTML5, but it must be parseable as XML. So it must be a polyglot XML/HTML5 document. This polyglot is called XHTML5, but is defined as part of the HTML5 standard rather than in a separate standard as previous XHTML versions were.

Clear as mud?
Damn alphabet soup ! I hate XML! lol
twobits is offline   Reply With Quote
Old 02-20-2013, 04:29 PM   #13
twobits
Addict
twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.twobits ought to be getting tired of karma fortunes by now.
 
Posts: 223
Karma: 1057019
Join Date: Oct 2010
Device: none
Quote:
Originally Posted by Toxaris View Post
That sums it about up. You can consider the XSLT as the CSS for XML files.
Not properly. CSS defines how to render/display the document. XSLT defines how to transform it from one DTD to another DTD or type.
twobits is offline   Reply With Quote
Old 02-20-2013, 10:13 PM   #14
carlosbcg
Member
carlosbcg began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Feb 2013
Device: Linux laptop and iPhone
Quote:
Originally Posted by Turtle91 View Post
Welcome to the Forum!

It can be confusing at first. But I'll give a very basic overview and then point you to where you can get some good info.
Thanks very much for taking the time to post that Turtle (or maybe Dion?).

The only part that was a tad seemingly contradictory to me is the part where you said you don't know what XML is and that you don't do anything with that given that I have discovered that both EPUB2 and EPUB3 use XML for a couple of files that are crucial.

But other than that what you said made sense.

Carlos
carlosbcg is offline   Reply With Quote
Old 02-20-2013, 10:24 PM   #15
carlosbcg
Member
carlosbcg began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Feb 2013
Device: Linux laptop and iPhone
Quote:
Originally Posted by dgatwood View Post
Clear as mud?
Ahhh...well...mostly LOL.

You did an excellent job of explaining things.

There were however a couple of spots where things are still a whee bit confusing if you or someone else could expand and explain a bit more.

Specifically...

Quote:
HTML5 is a specific version of HTML. Like all HTMLs, it is an SGML, but HTML5 files are not (necessarily) proper XML.
Hmm...but...but...don't EPUB3 internal files holding the actual content of an ebook as HTML5 files (albeit with an extension of HTML) have to be what is termed "serialized XHTML" (not altogether sure what that means but I think it means pretty much XHTML)?

In other words don't EPUB3 content internals HAVE to be the XHTML variant of the HTML5?

I am creating my ebook in EPUB3 by the way following the lead of Oreilly. I figure if it's good enough for them it's good enough for me.

Quote:
XSLT is another XML dialect. An XSLT stylesheet provides a set of rules for transforming from one XML dialect to another (typically, though in practice, it can be used to translate a specified XML dialect into pretty much anything, up to and including LaTeX commands).
Hmm...interesting. I take it then that XSLT is completely uneccessary to creation of an EPUB?

But just out of curiosity...how exactly is an XSLT file with XML commands in it get executed to do it's conversion work? Does a browser execute the XSLT commands or something?

Quote:
EPUB2 and EPUB3 are versions of EPUB. EPUB2 uses XHTML under the hood. EPUB3 uses HTML5, but it must be parseable as XML. So it must be a polyglot XML/HTML5 document. This polyglot is called XHTML5, but is defined as part of the HTML5 standard rather than in a separate standard as previous XHTML versions were.
That's quite the deep sentence there.

What is a polyglot document? Do you mean a document which has both XML and HTML5?

So are XHTML5 and HTML5 the same thing? I mean if XHTML5 is defined as part of the HTML5 standard I mean and not separately like in the past?

So can I refer to the 5 thing as either XHTML5 OR HTML5?

Any further clarification from you or anyone else would be appreciated.

I think I am finally beginning to make heads or tails of this.

Carlos
carlosbcg is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML OR XHTML? Wich One To Use? fcm52 Workshop 8 12-11-2012 09:23 PM
JAVASCRIPT support in ePub2/ePub3 Raja1205 ePub 7 09-03-2012 06:48 AM
When Calibre Goes to Sigil in HTML *and* XHTML Tulpana Sigil 6 07-09-2012 10:03 AM
HTML Anchor in XHTML arreke Sigil 3 12-27-2010 01:28 PM
Jetbook HTML (XHTML) rogue_ronin Ectaco jetBook 19 02-12-2010 09:13 PM


All times are GMT -4. The time now is 07:47 AM.


MobileRead.com is a privately owned, operated and funded community.