MobileRead Forums - View Single Post - epubcheck, any software can pass its validiation check?

kovidgoyal · 03-12-2010, 01:33 PM

Allow me to construct a little gendanken to illustrate things for people that need examples to understand abstract reasoning:

Say we want to create an EPUB e-book. Say it has 50 chapters each of which is in a nice separate XHTML file. Now suppose we've edited all the files and seen they look really pretty in a nice WYSWYG HTML editor. Now it's time to test our EPUB. Imagine two parallel realities.

In reality A our EPUB creator (lets call him Mr. X) was dropped on his head when he was a baby and so has an unholy reverence for XML schemas and thinks they are the cure for all evils.

In reality B our EPUB creator was dropped on his head twice so he's forgotten all his reverence, holy or unholy.

Finally, let's stipulate that Mr. X has a finite amount of time, say two hours to check his EPUB file before reaching his publishing deadline.

Reality A

Mr. X imediately fires up epubcheck, because he believes, with all the fervor of a true believer that if epubcheck passes his EPUB, he's golden.

epub check spits out lots and lots of lines, say about a 1000 that look like this:

ERROR: mybook.epub(8): could not parse content/index_cr_2.html: duplicate id: top
and this message is repeated 50 times.

Then there are messages like

ERROR: mybook.epub(12): attribute "name" not allowed at this point; ignored

repeated say a 1000 times

So our conscientious Mr. X is horrified. My God, my EPUB will never work he says. So he spends an hour googling to figure out what those error messages actually mean. He realises he has to, horror-of-horrors actually use a text editor to fix those problems. So he painstakingly edits each file by hand (he's never learned to use regexps) and by dint of sheer determination manages to make all these error messages go away. He now has 30 seconds left to meet his deadline. But epubcheck says all is well, so he quickly fires up firefox and submits his book to an online distributor. The distributor in turn runs epubcheck, which says nothing and so the book is released to an unsuspecting public.

The next day Mr. X finds his INBOX filling up with emails from disappointed readers. One gy says he tried to open the epub in stanza and all the links in the epub didn't work. Another guys says he tried to use the table of contents on his sony reader and it took half an hour to load. A third guys says that on his PDA the text in some chapters runs of the screen. A fourth guy complains that the text on the dedication page seems to start at the middle of the page and run off the right.

epubcheck really saved the day for Mr. X

Switch to reality B

Mr. X saves his epub file then immediately opens it in say the calibre viewer, where everything loos good. But being wise, he then opens it up in the desktop ADE where he sees the dedication page is wrong and the text runs of the screen in a few chapters where he has used a <table> to layout text. So he quickly fires up his editor and fixes the table, replacing it with a simple linear layout. Then he googles a bit and learns that ADE doesn't support text-anchor="middle" for SVG text (even though it is perfectly valid as per the SVG spec) so he changes his dedication page to use simple text.

Then he opens his epub on his SONY reader and finds the Table of Contents take forever to load. He googles some more and learns that for some weird reason, if he uses anchors in the toc.ncx, then the sony reader preloads all the files before rendering the table of contents. So he quickly goes and remove the anchor, which is rather useless anyway (though perfectly valid) because all his chapters are in separate files anyway.

Phew, now his book is looking like it might work. But then he remembers Stanza, so he loads his file onto an iPod and discovers none of the hyperlinks work, they all take his to incorrect locations. Horrified he googles some more and figures out that stanza can't handle files in different sub-direcories that have the same file name. Oh boy! Now he has to rename all his files and all the links that point to them. A panicky half an hour later, that's done.

Finally, Mr. X remembers that his best friend (who was dropped on his head only once) recommended he use epubcheck. But he looks at his clock and sees he doesn't have any time to do that anymore. So he fires up firefox and submits his file to the distributor. In this reality, the distributor's tech guy is really lazy so he doesn't have the system setup to run epubcheck. The EPUB is therefore released without any further ado.

The next day Mr. X's inbox fills up with emails from happy readers telling him how his book has changed their lives and how they were able to read it on their iPods, and sony readers and PDAs all with no problems what so ever.

I hope that clears up just why I object to epubcheck. And let me say that if unlike Mr. X, you have infinite time to proof your epubs, by all means use epubcheck.

03-12-2010, 01:33 PM	#61
kovidgoyal creator of calibre Posts: 45,450 Karma: 27757438 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Allow me to construct a little gendanken to illustrate things for people that need examples to understand abstract reasoning: Say we want to create an EPUB e-book. Say it has 50 chapters each of which is in a nice separate XHTML file. Now suppose we've edited all the files and seen they look really pretty in a nice WYSWYG HTML editor. Now it's time to test our EPUB. Imagine two parallel realities. In reality A our EPUB creator (lets call him Mr. X) was dropped on his head when he was a baby and so has an unholy reverence for XML schemas and thinks they are the cure for all evils. In reality B our EPUB creator was dropped on his head twice so he's forgotten all his reverence, holy or unholy. Finally, let's stipulate that Mr. X has a finite amount of time, say two hours to check his EPUB file before reaching his publishing deadline. Reality A Mr. X imediately fires up epubcheck, because he believes, with all the fervor of a true believer that if epubcheck passes his EPUB, he's golden. epub check spits out lots and lots of lines, say about a 1000 that look like this: ERROR: mybook.epub(8): could not parse content/index_cr_2.html: duplicate id: top and this message is repeated 50 times. Then there are messages like ERROR: mybook.epub(12): attribute "name" not allowed at this point; ignored repeated say a 1000 times So our conscientious Mr. X is horrified. My God, my EPUB will never work he says. So he spends an hour googling to figure out what those error messages actually mean. He realises he has to, horror-of-horrors actually use a text editor to fix those problems. So he painstakingly edits each file by hand (he's never learned to use regexps) and by dint of sheer determination manages to make all these error messages go away. He now has 30 seconds left to meet his deadline. But epubcheck says all is well, so he quickly fires up firefox and submits his book to an online distributor. The distributor in turn runs epubcheck, which says nothing and so the book is released to an unsuspecting public. The next day Mr. X finds his INBOX filling up with emails from disappointed readers. One gy says he tried to open the epub in stanza and all the links in the epub didn't work. Another guys says he tried to use the table of contents on his sony reader and it took half an hour to load. A third guys says that on his PDA the text in some chapters runs of the screen. A fourth guy complains that the text on the dedication page seems to start at the middle of the page and run off the right. epubcheck really saved the day for Mr. X Switch to reality B Mr. X saves his epub file then immediately opens it in say the calibre viewer, where everything loos good. But being wise, he then opens it up in the desktop ADE where he sees the dedication page is wrong and the text runs of the screen in a few chapters where he has used a <table> to layout text. So he quickly fires up his editor and fixes the table, replacing it with a simple linear layout. Then he googles a bit and learns that ADE doesn't support text-anchor="middle" for SVG text (even though it is perfectly valid as per the SVG spec) so he changes his dedication page to use simple text. Then he opens his epub on his SONY reader and finds the Table of Contents take forever to load. He googles some more and learns that for some weird reason, if he uses anchors in the toc.ncx, then the sony reader preloads all the files before rendering the table of contents. So he quickly goes and remove the anchor, which is rather useless anyway (though perfectly valid) because all his chapters are in separate files anyway. Phew, now his book is looking like it might work. But then he remembers Stanza, so he loads his file onto an iPod and discovers none of the hyperlinks work, they all take his to incorrect locations. Horrified he googles some more and figures out that stanza can't handle files in different sub-direcories that have the same file name. Oh boy! Now he has to rename all his files and all the links that point to them. A panicky half an hour later, that's done. Finally, Mr. X remembers that his best friend (who was dropped on his head only once) recommended he use epubcheck. But he looks at his clock and sees he doesn't have any time to do that anymore. So he fires up firefox and submits his file to the distributor. In this reality, the distributor's tech guy is really lazy so he doesn't have the system setup to run epubcheck. The EPUB is therefore released without any further ado. The next day Mr. X's inbox fills up with emails from happy readers telling him how his book has changed their lives and how they were able to read it on their iPods, and sony readers and PDAs all with no problems what so ever. I hope that clears up just why I object to epubcheck. And let me say that if unlike Mr. X, you have infinite time to proof your epubs, by all means use epubcheck. Last edited by kovidgoyal; 03-12-2010 at 01:37 PM.