View Full Version : epubcheck, any software can pass its validiation check?


droople
03-08-2010, 07:07 AM
Hi

I just used epubcheck to validate some epub files made by Calibra, eCub, Sigil, quite interesting, none of them can pass the validation check, although all the books can be read on the machine.

Does this mean that actually no one care about the epub specification?

Cheers

PS: I found the epub file made by Atlantis can pass the validation check.

charleski
03-08-2010, 07:19 AM
A lot of files fail because they're using xhtml that doesn't precisely comply with the relevant specification. Most of these errors are trivial at the moment, but may become non-trivial in the future as the xhtml spec evolves.

The purpose of the spec is to make sure that you'll be able to read your epubs on future devices as well as the ones you have now, so yes, it's a good idea to make sure they're compliant.

Valloric
03-08-2010, 05:09 PM
I just used epubcheck to validate some epub files made by Calibra, eCub, Sigil, quite interesting, none of them can pass the validation check.

If an epub you saved with Sigil doesn't pass epubcheck, then it's usually because the XHTML imported into Sigil had problems. If you see an instance where Sigil saves an epub that doesn't pass epubcheck where the original file did, then that's bug report (of high priority).

I try to make Sigil epubcheck-clean; that is, I try my best to makes sure Sigil doesn't introduce non-compliance. But if the original file was non-compliant, there's a good chance Sigil won't be able to make it compliant (although it usually can).

droople
03-08-2010, 10:30 PM
Hi Valloric
Thank you for the reply.

What I did with Sigil epub file is type in "test" in the text field, without import any external files.

Here is the validation message
ERROR: test.epub/OEBPS/content.opf(6): unfinished element

And please find the attached file.

Cheers

kovidgoyal
03-08-2010, 11:46 PM
Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.

The best that could possibly be said about epubcheck (and I wouldn't be comfortable saying this without actual data) is that a file that passes epubcheck may be more likely than a file that does not pass to render correctly with most epub viewers. Even if this were true it would most likely be so because files that tend to pass epubcheck tend, on average, to have extremely simple markup, as they are typically the product of machine translation from some extremely simple format.

What I'm trying to say is that the things that epubcheck checks are those things that it is easiest to write software to check, not those things that are most likely to cause problems, or those things that are most likely to occur in the wild.

What epubcheck is good for, is those situations where you have absolutely no idea why your epub file is not rendering with a particular renderer. In that case, you can try running epubcheck on it and fix the errors it points out. Of course, that may or may not fix your actual problem. And even for this use case, epubcheck is extremely sub-optimal since its error messages are incredibly unhelpful.

That's my epubcheck jeremiad for this week.

awp
03-09-2010, 12:08 AM
PS: I found the epub file made by Atlantis can pass the validation check.

Any EPUB file generated by Atlantis is supposed to pass the EPUB validation test. (http://atlantiswordprocessor.blogspot.com/2009/10/validating-epubs.html) If you have a document that Atlantis Word Processor does not convert to a "valid EPUB", please email it to support@AtlantisWordProcessor.com.

Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.

It is not completely meaningless. Some publishers do not accept EPUBs that do not pass the EPUB validation test.

awp
03-09-2010, 12:15 AM
Here is the validation message
ERROR: test.epub/OEBPS/content.opf(6): unfinished element

And please find the attached file.


This is because some mandatory metadata items are missing.

droople
03-09-2010, 01:11 AM
Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.

The best that could possibly be said about epubcheck (and I wouldn't be comfortable saying this without actual data) is that a file that passes epubcheck may be more likely than a file that does not pass to render correctly with most epub viewers. Even if this were true it would most likely be so because files that tend to pass epubcheck tend, on average, to have extremely simple markup, as they are typically the product of machine translation from some extremely simple format.

What I'm trying to say is that the things that epubcheck checks are those things that it is easiest to write software to check, not those things that are most likely to cause problems, or those things that are most likely to occur in the wild.

What epubcheck is good for, is those situations where you have absolutely no idea why your epub file is not rendering with a particular renderer. In that case, you can try running epubcheck on it and fix the errors it points out. Of course, that may or may not fix your actual problem. And even for this use case, epubcheck is extremely sub-optimal since its error messages are incredibly unhelpful.

That's my epubcheck jeremiad for this week.

Oh, ok

kovidgoyal
03-09-2010, 02:03 AM
It is not completely meaningless. Some publishers do not accept EPUBs that do not pass the EPUB validation test.

That just goes to show that those publishers don't have a clue.

droople
03-09-2010, 03:44 AM
That just goes to show that those publishers don't have a clue.

oh, ok.

awp
03-09-2010, 04:21 AM
That just goes to show that those publishers don't have a clue.

Maybe. But it is always safer to follow the specifications. If the OPF specification says that the "title", "identifier" and "language" metadata items are "required", why not include them in any EPUB file?

"Invalid" EPUBs might be accepted by the currently available EPUB readers. But there is no guarantee that they will be accepted by future readers.

pdurrant
03-09-2010, 05:04 AM
It's a design decision not to add random values to the metadata for the required metadata fields.

If you use the metadata editing interface in Sigil, and enter values for the required items, Sigil ePubs pass epubcheck if you also have valid XHTML in the content.

What I did with Sigil epub file is type in "test" in the text field, without import any external files.

Here is the validation message
ERROR: test.epub/OEBPS/content.opf(6): unfinished element

charleski
03-09-2010, 10:19 AM
Lol, I was expecting Kovid to take the chance to bang on about epubcheck again.

While it's not perfect (and desperately needs another release) I've found it very useful as a basic 'sanity check' to make sure that all the required elements are in the right places. Unfortunately many converters still output deprecated xhtml attributes, which is what causes most of the errors.

kovidgoyal
03-09-2010, 10:22 AM
Maybe. But it is always safer to follow the specifications. If the OPF specification says that the "title", "identifier" and "language" metadata items are "required", why not include them in any EPUB file?

"Invalid" EPUBs might be accepted by the currently available EPUB readers. But there is no guarantee that they will be accepted by future readers.

Don't get me wrong, I'm not advocating that people ignore the standards on purpose. I just want people to realize that having your book passed by epubcheck is a guarantee of precisely nothing. I think having something like epubcheck does more harm than good, simply because everybody automatically assumes that if you have an EPUB that passes epubcheck's tests, it will work correctly.

epubcheck should have been named epub-schema-check to help make it clear that all it does is validate a few XML schemas.

kovidgoyal
03-09-2010, 10:24 AM
Lol, I was expecting Kovid to take the chance to bang on about epubcheck again.


Yeah, I just can't resist :) There's something about epubcheck's naivete that really gets me going.

pdurrant
03-09-2010, 10:35 AM
I just want people to realize that having your book passed by epubcheck is a guarantee of precisely nothing.

It's a guarantee that it passes that particular version of epubcheck!

On the other hand, failing to pass epubcheck means that 99 times out of 100 there's something wrong with your ePub. Even though it might still render OK in some readers, it would be better to fix the problems.

loveangel
03-09-2010, 10:49 AM
Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.

The best that could possibly be said about epubcheck (and I wouldn't be comfortable saying this without actual data) is that a file that passes epubcheck may be more likely than a file that does not pass to render correctly with most epub viewers. Even if this were true it would most likely be so because files that tend to pass epubcheck tend, on average, to have extremely simple markup, as they are typically the product of machine translation from some extremely simple format.

What I'm trying to say is that the things that epubcheck checks are those things that it is easiest to write software to check, not those things that are most likely to cause problems, or those things that are most likely to occur in the wild.

What epubcheck is good for, is those situations where you have absolutely no idea why your epub file is not rendering with a particular renderer. In that case, you can try running epubcheck on it and fix the errors it points out. Of course, that may or may not fix your actual problem. And even for this use case, epubcheck is extremely sub-optimal since its error messages are incredibly unhelpfulhttp://www.mypictureshare.com/img/L/c.gif.

That's my epubcheck jeremiad for this week.
Got your point !

charleski
03-09-2010, 11:56 AM
While it's not perfect (and desperately needs another release)
I'm happy to say I've just found I'm completely wrong there! :)

epubcheck 1.0.5 was released for download (http://code.google.com/p/epubcheck/downloads/list) on Feb 18, and should contain a host of fixes. No need to use the old Dec '08 version any more. The earlier Threepress blog (http://blog.threepress.org/2010/02/05/web-based-epubcheck-upgraded-to-epubcheck-1-0-5/) post on it left me thinking the actual release was still some time away.

Jellby
03-09-2010, 02:55 PM
I just want people to realize that having your book passed by epubcheck is a guarantee of precisely nothing.

But, as pdurrant says, if the book doesn't pass epubcheck, that's a guarantee of even less. It serves, at least, to catch the errors it catches. For me it has been useful to check I have not forgotten to add any file, that the internal links are consistent, that I didn't leave HTML entities in the NCX file, etc. Failure to pass epubcheck means that, if your book does not display fine, you cannot blame the reader ;)

Of course, I'm saying p -> q, that does not imply p -> q ;)

kovidgoyal
03-09-2010, 03:29 PM
Failure to pass epubcheck means that, if your book does not display fine, you cannot blame the reader ;)


Yes you can, your file is most likely not displaying in the reader for reasons that have nothing to do with the errors epubcheck is pointing out.

But maybe my perspective on this issue is somewhat different, since I don't hand create EPUB files. Instead I hand create an e-book in TXT or HTML and auto generate the EPUB from that. So things like checking the NCX and OPF files for validity are not important to me, all that's automatically taken care of :)

Valloric
03-09-2010, 03:39 PM
What I did with Sigil epub file is type in "test" in the text field, without import any external files.

Here is the validation message
ERROR: test.epub/OEBPS/content.opf(6): unfinished element


This is because some mandatory metadata items are missing.

It's a design decision not to add random values to the metadata for the required metadata fields.

If you use the metadata editing interface in Sigil, and enter values for the required items, Sigil ePubs pass epubcheck if you also have valid XHTML in the content.

What pdurrant said. Emphasis mine.

Valloric
03-09-2010, 03:44 PM
What I'm trying to say is that the things that epubcheck checks are those things that it is easiest to write software to check, not those things that are most likely to cause problems, or those things that are most likely to occur in the wild.

I don't think anyone would disagree with this. Epubcheck will not find the serious errors, but it will find the simple ones. So why not fix the simple ones?

As others have noted, it's a sanity check. Passing epubcheck doesn't mean your file doesn't have problems, but not passing it means it does in 99% of cases.

Again, epubcheck will scream on even trivial things you can safely ignore, but if you're trying to build an error-free epub, why not fix those?

kovidgoyal
03-09-2010, 03:49 PM
Again, epubcheck will scream on even trivial things you can safely ignore, but if you're trying to build an error-free epub, why not fix those?

No reason except that you and I and pdurrant and Jellby recognize that these are trivial errors. Most people don't.

Valloric
03-09-2010, 03:58 PM
No reason except that you and I and pdurrant and Jellby recognize that these are trivial errors. Most people don't.

But you also said that epubcheck is doing "more harm than good". I strongly disagree with that.

Epubcheck is like a spell checker. It will catch things like "teh" instead of "the", but not "they're" instead of "their", or the more serious grammatical errors. But even within its limited capabilities, a spell checker is extremely useful. Sure, there are those (misguided) people who think that just because their document passes a spell checker it is error-free, but that's not the spell checker's fault.

It's the fault of the bozos who don't know better.

And bashing a spell checker because someone else is a bozo is IMO silly.

Jellby
03-09-2010, 04:07 PM
Yes you can, your file is most likely not displaying in the reader for reasons that have nothing to do with the errors epubcheck is pointing out.

In a sense, you are right, the reader's problems are not necessarily related to the problems detected by epubcheck, but if the book is not a valid ePUB file, you should't expect the reader to display it correctly. Make it first a valid ePUB, and you can blame the reader as much as you want ;)

kovidgoyal
03-09-2010, 04:12 PM
But you also said that epubcheck is doing "more harm than good". I strongly disagree with that.

And bashing a spell checker because someone else is a bozo is IMO silly.

But epubcheck is a spell checker that does not claim to be a spell checker. It claims to be a grammar checker. I also said that epubcheck should have been named epub-schema-check or something equivalent.

kovidgoyal
03-09-2010, 04:13 PM
In a sense, you are right, the reader's problems are not necessarily related to the problems detected by epubcheck, but if the book is not a valid ePUB file, you should't expect the reader to display it correctly. Make it first a valid ePUB, and you can blame the reader as much as you want ;)

In my (rather extensive) experience, they're almost never related to problems detected by epubcheck.

droople
03-09-2010, 04:42 PM
But epubcheck is a spell checker that does not claim to be a spell checker. It claims to be a grammar checker. I also said that epubcheck should have been named epub-schema-check or something equivalent.

Do you mean "spell mistake" is not important and you never correct them?

Cheers

droople
03-09-2010, 04:43 PM
I don't think anyone would disagree with this. Epubcheck will not find the serious errors, but it will find the simple ones. So why not fix the simple ones?

As others have noted, it's a sanity check. Passing epubcheck doesn't mean your file doesn't have problems, but not passing it means it does in 99% of cases.

Again, epubcheck will scream on even trivial things you can safely ignore, but if you're trying to build an error-free epub, why not fix those?

I strong agree.:thumbsup:

kovidgoyal
03-09-2010, 05:44 PM
Do you mean "spell mistake" is not important and you never correct them?

Cheers

No I mean that assuming something is error free because it has no spelling errors is wrong.

rmm200
03-09-2010, 07:27 PM
Epubcheck, like Calibre, is a work in progress. Again like Calibre, I think it is supported by volunteers.

Just because it has limitations is no excuse to not use it. Until something better comes along, I think anyone generating Epubs should use it as the before-mentioned sanity check. And if you don't like the way it works - volunteer some of your time to make it better.

Epubs need good tools to reach their potential - and for the moment Epubcheck is the best we have.

Robert

kovidgoyal
03-09-2010, 08:02 PM
My objection to epubcheck is that it gives the impression that using it means your epub is actually guaranteed to work on anything

To put it another way

I would have no objection to epubcheck if it was was named epub-schema-check or if it printed out a big fat disclaimer about how it's checks dont guarantee anything.

Or to put it slightly less seriously

Circling a banyan tree three times while reciting the English alphabet backwards appeases the God of e-book formatting, so you should always do that immediately after you run epubcheck. Just in case. It can't hurt. Banyan trees are shady, and good for the environment.

kovidgoyal
03-09-2010, 08:15 PM
@rmm200: Are you implying that the fact that a piece of software is developed by volunteers means that it is bound to be mediocre?

rmm200
03-09-2010, 09:21 PM
@rmm200: Are you implying that the fact that a piece of software is developed by volunteers means that it is bound to be mediocre?

I love Calibre - I think it is great. Last time I looked Calibre had a bunch of closed bugs and a fair number of open ones. Work in progress means you are not done yet... There is always another bug ahead.

Are you saying Epubcheck will never be better than mediocre?

I am not associated with Epubcheck, but I know the great value of tools like FindBugs (http://findbugs.sourceforge.net/) in the Java arena. There are definite analogies here - and I would consider myself irresponsible as a programmer if I did not use the tools available to me.

Robert

kovidgoyal
03-09-2010, 09:31 PM
I love Calibre - I think it is great. Last time I looked Calibre had a bunch of closed bugs and a fair number of open ones. Work in progress means you are not done yet... There is always another bug ahead.

Are you saying Epubcheck will never be better than mediocre?

I am not associated with Epubcheck, but I know the great value of tools like FindBugs (http://findbugs.sourceforge.net/) in the Java arena. There are definite analogies here - and I would consider myself irresponsible as a programmer if I did not use the tools available to me.

Robert

Are you saying you should use a tool simply because it exists, irrespective of whether it is any good or not? I have no idea if epubcheck will one day become useful or not. All I can say is whether it is useful or not today.

In fact I would even happily stipulate it is useful today, I just want to emphasize that it does not perform the function that many people assume it does, that of guaranteeing an EPUB will work.

So an EPUB creators job doesn't really end if his EPUB is passed by epubcheck. She needs to actually look at it in a few common EPUB viewers as well.

As an illustration, 90% of the code in the calibre EPUB output plugin deals with transforming perfectly valid (as per epubcheck) structures to other perfectly valid structures. Except that the second set of structures have the advantage that they actually work as intended on common EPUB renderers.

JSWolf
03-09-2010, 11:42 PM
mypictureshares.com is way too slow. Whoever is using it, please unlink from it. Thanks.

Peter Sorotokin
03-10-2010, 12:41 AM
Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.

The best that could possibly be said about epubcheck (and I wouldn't be comfortable saying this without actual data) is that a file that passes epubcheck may be more likely than a file that does not pass to render correctly with most epub viewers. Even if this were true it would most likely be so because files that tend to pass epubcheck tend, on average, to have extremely simple markup, as they are typically the product of machine translation from some extremely simple format.

What I'm trying to say is that the things that epubcheck checks are those things that it is easiest to write software to check, not those things that are most likely to cause problems, or those things that are most likely to occur in the wild.

What epubcheck is good for, is those situations where you have absolutely no idea why your epub file is not rendering with a particular renderer. In that case, you can try running epubcheck on it and fix the errors it points out. Of course, that may or may not fix your actual problem. And even for this use case, epubcheck is extremely sub-optimal since its error messages are incredibly unhelpful.

That's my epubcheck jeremiad for this week.

Even though I do not totally agree with what you say, I can certainly see your point and confirm that there is a lot of truth in it. When you hit a specific problem that epubcheck should have caught in your opinion, could you make a note of it and either post this stuff somewhere, or send me a message, or create an issue for epubcheck? This way it can be improved to be a more useful tool.

kovidgoyal
03-10-2010, 01:11 AM
Sure, I'll be happy to. Most issues I come up against are typically limitations of this or that EPUB viewer software though. I'm not sure if these are the kinds of issues you want epubcheck to check. If so, I'll be happy to open bug reports for it. Perhaps epubcheck should get a --rendering mode that can be used to check issues that affect rendering on various EPUB viewers.

There's already a long list of these types of issues at http://bugs.calibre-ebook.com/wiki/ADEQuirks that affect Adobe Digital Editions. Most of these issues have probably been fixed in newer ADE releases, but the existence of legacy readers means that EPUB documents need to workaround them to be sure of rendering well on old SONY readers for example.

pdurrant
03-10-2010, 05:09 AM
Even though I do not totally agree with what you say, I can certainly see your point and confirm that there is a lot of truth in it. When you hit a specific problem that epubcheck should have caught in your opinion, could you make a note of it and either post this stuff somewhere, or send me a message, or create an issue for epubcheck? This way it can be improved to be a more useful tool.

I was pleased to find that the latest 1.0.5 epubcheck no longer reportes problems on IDPF-obfuscated fonts. It still reports problems on Adobe-obfusticated fonts, which is reasonable, I suppose, because the Adobe method is non-standard.

Of course, (nearly) all ePub readers understand Adobe obfusticated fonts, and the most important one, Adobe Digital Editions, doesn't understand IDPF obfusticated ones. Which makes the situation less than optimal.

Sigh... I just want to publish an ePub with an embedded fton in a way that satifies the font vendors, and yet doesn't encrypt the text. An obfusticated font is the way to go, but using Adobe obfustication means the file won't pass epubcheck, and so my distributor won't take it, and using IDPF obfustication passes epubcheck, but then the font won't show up for the majority of (current) users!

Hopefully ADE will support IDPF obfusticated fonts soon.

(Hmm... I seem to be trying to hijack the thread. Apologies.)

Valloric
03-10-2010, 08:29 AM
Even though I do not totally agree with what you say, I can certainly see your point and confirm that there is a lot of truth in it. When you hit a specific problem that epubcheck should have caught in your opinion, could you make a note of it and either post this stuff somewhere, or send me a message, or create an issue for epubcheck? This way it can be improved to be a more useful tool.

I personally find epubcheck to be an invaluable tool even after taking its limitations into consideration. The only thing I don't like about it is that it's written in Java, and therefore can't be easily embedded into Sigil. But that's my problem, not epubcheck's.

Keep up the good work.


Sigh... I just want to publish an ePub with an embedded fton in a way that satifies the font vendors, and yet doesn't encrypt the text. An obfusticated font is the way to go, but using Adobe obfustication means the file won't pass epubcheck, and so my distributor won't take it, and using IDPF obfustication passes epubcheck, but then the font won't show up for the majority of (current) users!

Oh the pain. That's horrible. :D

Peter Sorotokin
03-10-2010, 12:34 PM
Of course, (nearly) all ePub readers understand Adobe obfusticated fonts, and the most important one, Adobe Digital Editions, doesn't understand IDPF obfusticated ones. Which makes the situation less than optimal.

The most recent Adobe SDK does implement IDPF method, so you'll see devices that support it pretty soon. But yes, it will take time to filter down.

Peter Sorotokin
03-10-2010, 12:38 PM
Sure, I'll be happy to. Most issues I come up against are typically limitations of this or that EPUB viewer software though. I'm not sure if these are the kinds of issues you want epubcheck to check. If so, I'll be happy to open bug reports for it. Perhaps epubcheck should get a --rendering mode that can be used to check issues that affect rendering on various EPUB viewers.

Frankly speaking, I wanted to add such a mode for a long time, but I was concerned that it would be seen as some sort of evil Adobe plot. Perhaps I'll give it another thought.

pdurrant
03-10-2010, 12:42 PM
The most recent Adobe SDK does implement IDPF method, so you'll see devices that support it pretty soon. But yes, it will take time to filter down.

Excellent news. Thanks.

kovidgoyal
03-10-2010, 12:51 PM
Frankly speaking, I wanted to add such a mode for a long time, but I was concerned that it would be seen as some sort of evil Adobe plot. Perhaps I'll give it another thought.

Well it should not only warn about problems with ADE. For example, empty <pre> elements are not rendered correctly in webkit based viewers and stanza can't handle an epub that has files with the same filename in different directories.

Of course, Stanza has so many problems that it's probably not worth supporting at all.

JSWolf
03-10-2010, 01:41 PM
The most recent Adobe SDK does implement IDPF method, so you'll see devices that support it pretty soon. But yes, it will take time to filter down.

But unless Sony implements the most recent SDK for the 300, 500, 505, 600, & 700, then the publishers cannot use the new fixes/changes as the books will then break on a Sony Reader. I'm not sure how up-to-date the 900 is. But Sony needs to keep the ADE engine updated. Why that have not it silly. Do you know if Sony is planning on implementing the latest ADE on the Readers or is Sony making the world have to work with the old version despite changes/fixes?

I do think Adobe should make it part of the contract that if they use ADE, that it has to be updated so we are not stuck with old ADE when new ADE has bug fixes and other enhancements. Sure, the new ADE supports IDPF font handling. But if a number of Readers do not, then what good is it?

JSWolf
03-10-2010, 01:42 PM
Well it should not only warn about problems with ADE. For example, empty <pre> elements are not rendered correctly in webkit based viewers and stanza can't handle an epub that has files with the same filename in different directories.

Of course, Stanza has so many problems that it's probably not worth supporting at all.

We get to blame Amazon on that for buying Stanza and stopping work on it.

ATimson
03-10-2010, 02:06 PM
I do think Adobe should make it part of the contract that if they use ADE, that it has to be updated so we are not stuck with old ADE when new ADE has bug fixes and other enhancements. Sure, the new ADE supports IDPF font handling. But if a number of Readers do not, then what good is it?
If Adobe were willing to foot the bill for creating and supporting any such upgrades, that might happen. But if they're going to force updates, I think we'd instead see device makers rolling their own engines - which is Not A Good Thing.

ATimson
03-10-2010, 02:07 PM
We get to blame Amazon on that for buying Stanza and stopping work on it.
The last release was in December 2009, eight months after the acquisition. If they stopped all work on Stanza, they certainly took their time about it...

charleski
03-10-2010, 02:49 PM
My objection to epubcheck is that it gives the impression that using it means your epub is actually guaranteed to work on anything Well, with respect Kovid, where does it give that impression? Anyone who is capable of thinking, 'My book passes a number of elementary compliance checks, therefore there's nothing wrong with it,' needs to go back to school and learn basic logic.

There may well be people in the publishing trade who suffer from such a delusion, but you can't blame epubcheck for the fact that some of the people using it are fools.

If Adobe were willing to foot the bill for creating and supporting any such upgrades, that might happen. But if they're going to force updates, I think we'd instead see device makers rolling their own engines - which is Not A Good Thing.
That's not going to happen for a variety of reasons, not least of which the legal issues that would surround a 3rd party trying to re-engineer Adobe's DRM decryption. The principal problem is that manufacturers of embedded devices tend not to budget for ongoing support costs - once the box is out the door they turn their attention to the next device because they make money by shifting boxes.

I had hoped that in the wake of Sony's (rather expensive) ePub upgrade program for the PRS500 they would release much cheaper firmware upgrades for the 505-700 as well to keep them up-to-date. So far nothing, but we live in hope.

ATimson
03-10-2010, 03:08 PM
That's not going to happen for a variety of reasons, not least of which the legal issues that would surround a 3rd party trying to re-engineer Adobe's DRM decryption.
My thinking was that they'd roll their own encryption, rather than reverse-engineer Adobe's, hence it being Not A Good Thing (for me, at least ;)).

DaleDe
03-10-2010, 06:24 PM
I had hoped that in the wake of Sony's (rather expensive) ePub upgrade program for the PRS500 they would release much cheaper firmware upgrades for the 505-700 as well to keep them up-to-date. So far nothing, but we live in hope.

Actually it was quite the opposite. Sony released old downlevel versions on new products.

Dale

JSWolf
03-10-2010, 06:31 PM
If Adobe were willing to foot the bill for creating and supporting any such upgrades, that might happen. But if they're going to force updates, I think we'd instead see device makers rolling their own engines - which is Not A Good Thing.

But because of Sony, we have to have ePub that has to work on their readers. We don't get to use new features or fixed bugs as we have to now dgo with the lowest common denominator which is ADE on the Sony Reader 300, 500, 505, 600, & 700. So do you think it's a good thing to allow Sony to keep ePub stagnant for everyone? It's not just for Sony users that it needs to be updated. It's for any user of ADE.

So let's say that Adobe fixed ADE so it's 100% compliant and has no bugs. Do we want to not take advantage of that because some company doesn't care about updating the ADE on their reader?

Valloric
03-10-2010, 06:50 PM
I do think Adobe should make it part of the contract that if they use ADE, that it has to be updated

Don't be ridiculous. I'm sure Adobe charges them for the updates. I know I would.

So no, I doubt Sony can get it for free. And they made a business decision not to pay for the updates (actually we don't have a clue what they did/are planning to do, we're all just guessing).

charleski
03-10-2010, 08:08 PM
Don't be ridiculous. I'm sure Adobe charges them for the updates. I know I would.Well, that depends on what Adobe's game-plan is. With Apple poised in the wings ready to snatch the ePub market away from them, I'd have thought Adobe would be eager to get its fixes out there to avoid embarrassing comparisons.

JSWolf
03-10-2010, 08:39 PM
Well, that depends on what Adobe's game-plan is. With Apple poised in the wings ready to snatch the ePub market away from them, I'd have thought Adobe would be eager to get its fixes out there to avoid embarrassing comparisons.

Actually, unless publishers make separate ePub for Apple, they'll still have to pander to the lowest common denominator and right now, that may be Sony. So no matter how good iBook is or txtr, they will be limited by Sony.

So yes, it's in everyone's best interests to get the reader makers to UPGRADE.

ATimson
03-10-2010, 10:15 PM
So let's say that Adobe fixed ADE so it's 100% compliant and has no bugs. Do we want to not take advantage of that because some company doesn't care about updating the ADE on their reader?
It would be nice if that happened. But I think it's more likely that reader companies would take their toys, go home, and roll their own reader software & DRM scheme. Instead of a low common denominator, you'd have no common denominator.

Maybe I'm just being too pessimistic. *shrug*

charleski
03-11-2010, 09:47 AM
It would be niceI think it's more likely that reader companies would take their toys, go home, and roll their own reader software & DRM scheme.
This would end up costing them about as much as it would to license ADE (unless Chinese programmers work for free) and leave them trying to sell a device which wouldn't work with any of the current epub bookstores or libraries. Sounds like a recipe for going bankrupt pretty fast.

ATimson
03-11-2010, 10:30 AM
This would end up costing them about as much as it would to license ADE (unless Chinese programmers work for free) and leave them trying to sell a device which wouldn't work with any of the current epub bookstores or libraries. Sounds like a recipe for going bankrupt pretty fast.
It might cost them as much for the first device. It'd be cheaper than footing the bill for fixing Adobe's problems in the long run.

Besides, being completely incompatible with everybody else seems to be working for Amazon. :p

odt2epub
03-12-2010, 01:48 PM
I'm quite a novice here, but I think some of the "veterans" should drop their arrogant EpubCheck-bashing tone which is not backed by any logical arguments.

Blaming EpubCheck because some of the readers do not display ePub files even though they validates makes no sense. Someone noticed quite well -- what gives you the impression, that the first thing is the cause of the second? That is like blaming W3C HTML validator because IE6 is a crappy browser. Do you validate your (X)HTML and deploy it without even checking it in different browsers?

kovidgoyal
03-12-2010, 01:50 PM
I'm quite a novice here, but I think some of the "veterans" should drop their arrogant EpubCheck-bashing tone which is not backed by any logical arguments.

Sigh, learn how to read, then learn how to think, and when you've achieved all that, return.

kovidgoyal
03-12-2010, 02:33 PM
Allow me to construct a little gendanken to illustrate things for people that need examples to understand abstract reasoning:

Say we want to create an EPUB e-book. Say it has 50 chapters each of which is in a nice separate XHTML file. Now suppose we've edited all the files and seen they look really pretty in a nice WYSWYG HTML editor. Now it's time to test our EPUB. Imagine two parallel realities.

In reality A our EPUB creator (lets call him Mr. X) was dropped on his head when he was a baby and so has an unholy reverence for XML schemas and thinks they are the cure for all evils.

In reality B our EPUB creator was dropped on his head twice so he's forgotten all his reverence, holy or unholy.

Finally, let's stipulate that Mr. X has a finite amount of time, say two hours to check his EPUB file before reaching his publishing deadline.

Reality A

Mr. X imediately fires up epubcheck, because he believes, with all the fervor of a true believer that if epubcheck passes his EPUB, he's golden.

epub check spits out lots and lots of lines, say about a 1000 that look like this:

ERROR: mybook.epub(8): could not parse content/index_cr_2.html: duplicate id: top
and this message is repeated 50 times.

Then there are messages like

ERROR: mybook.epub(12): attribute "name" not allowed at this point; ignored

repeated say a 1000 times

So our conscientious Mr. X is horrified. My God, my EPUB will never work he says. So he spends an hour googling to figure out what those error messages actually mean. He realises he has to, horror-of-horrors actually use a text editor to fix those problems. So he painstakingly edits each file by hand (he's never learned to use regexps) and by dint of sheer determination manages to make all these error messages go away. He now has 30 seconds left to meet his deadline. But epubcheck says all is well, so he quickly fires up firefox and submits his book to an online distributor. The distributor in turn runs epubcheck, which says nothing and so the book is released to an unsuspecting public.

The next day Mr. X finds his INBOX filling up with emails from disappointed readers. One gy says he tried to open the epub in stanza and all the links in the epub didn't work. Another guys says he tried to use the table of contents on his sony reader and it took half an hour to load. A third guys says that on his PDA the text in some chapters runs of the screen. A fourth guy complains that the text on the dedication page seems to start at the middle of the page and run off the right.

epubcheck really saved the day for Mr. X

Switch to reality B

Mr. X saves his epub file then immediately opens it in say the calibre viewer, where everything loos good. But being wise, he then opens it up in the desktop ADE where he sees the dedication page is wrong and the text runs of the screen in a few chapters where he has used a <table> to layout text. So he quickly fires up his editor and fixes the table, replacing it with a simple linear layout. Then he googles a bit and learns that ADE doesn't support text-anchor="middle" for SVG text (even though it is perfectly valid as per the SVG spec) so he changes his dedication page to use simple text.

Then he opens his epub on his SONY reader and finds the Table of Contents take forever to load. He googles some more and learns that for some weird reason, if he uses anchors in the toc.ncx, then the sony reader preloads all the files before rendering the table of contents. So he quickly goes and remove the anchor, which is rather useless anyway (though perfectly valid) because all his chapters are in separate files anyway.

Phew, now his book is looking like it might work. But then he remembers Stanza, so he loads his file onto an iPod and discovers none of the hyperlinks work, they all take his to incorrect locations. Horrified he googles some more and figures out that stanza can't handle files in different sub-direcories that have the same file name. Oh boy! Now he has to rename all his files and all the links that point to them. A panicky half an hour later, that's done.

Finally, Mr. X remembers that his best friend (who was dropped on his head only once) recommended he use epubcheck. But he looks at his clock and sees he doesn't have any time to do that anymore. So he fires up firefox and submits his file to the distributor. In this reality, the distributor's tech guy is really lazy so he doesn't have the system setup to run epubcheck. The EPUB is therefore released without any further ado.

The next day Mr. X's inbox fills up with emails from happy readers telling him how his book has changed their lives and how they were able to read it on their iPods, and sony readers and PDAs all with no problems what so ever.


I hope that clears up just why I object to epubcheck. And let me say that if unlike Mr. X, you have infinite time to proof your epubs, by all means use epubcheck.

Jellby
03-12-2010, 02:55 PM
Fortunately, I was dropped on my head three times, so I'm in reality C. I run epubcheck, learn that attribute "name" is not supported in XHTML and should be changed to "id", and make my ids unique, and learn it for my next books. After my book passes epubcheck I'm fairly confident my book is valid ePUB, and now start checking for software glitches. I have a quick look in a web browser, and in calibre, then upload it to my Cybook and read the book. After some days or weeks of swearing over ADE's glitches, stupid Cybook's margins, incomplete Unicode font support, etc. (this is the time it takes me to "proofread" the book), I have a book that I find satisfying enough and that, to the best of my knowledge, should work in sufficiently-standards-conformant ePUB readers. Even if I know it's not perfect (no smallcaps in ADE, slow TOC in Sony, no SVG cover in you-name-it...), I decide to publish it as is, and if someone thinks it's crap, they can blame their reading software and write them hate mail... after all, my book is published for free, with no DRM, and everyone is free to change it :D

Seriously, though, I think both viewpoints are important, we should care about validation and following specs, and we should also care about real-world performance. Mr. X should have woken up two hours earlier, so he would have had time to make both kinds of check.

kovidgoyal
03-12-2010, 02:58 PM
Seriously, though, I think both viewpoints are important, we should care about validation and following specs, and we should also care about real-world performance. Mr. X should have woken up two hours earlier, so he would have had time to make both kinds of check.

I agree, he should have. But in the real world he won't. And so in a resource constrained environment, it's important that he not be deluded into thinking that running epubcheck should be his top priority. Which is all I'm trying to convey. Run epubcheck by all means, but please, please realize there's a lot more to producing a book that will render as you want it to, than doing an XML schema check. Indeed, running an XML schema check is just about the least important.

charleski
03-12-2010, 09:54 PM
Anyone who can't craft a simple regexp to fix the problems you describe has no business taking money from a publisher to create ebooks in the first place. I'd say that in reality A epubcheck has done us all a favour by exposing his incompetence and getting him fired :).

No-one is saying that epubcheck is enough on its own. If you only have a limited amount of time, then sure, there are other ways of testing that should take precedence. But I have to take you back to my first reply to this thread - epubcheck is about future-proofing. What happens in 5 years' time when some xhtml standards committee decides that the 'name' attribute now specifies the number of sparkly lights to surround a character? Publishers need to take the long-view. They need to minimise the chance that they'll have to spend a ton of money in the future getting someone to correct a ton of errors that shouldn't have been made in the first place.

kovidgoyal
03-12-2010, 10:10 PM
If at any time in the next 50 years it so happens that someone produces a widely used EPUB reader that fails to render a document because of a name attribute that is where it shouldn't be, I will print out this entire thread, and eat it.

Jellby
03-13-2010, 04:58 AM
Run epubcheck by all means, but please, please realize there's a lot more to producing a book that will render as you want it to, than doing an XML schema check.

On the other hand, flooding the net with valid ePUB files that display deficiently in current reading systems could be a way to force software developers into improving the reading systems ;)

kovidgoyal
03-13-2010, 10:54 AM
On the other hand, flooding the net with valid ePUB files that display deficiently in current reading systems could be a way to force software developers into improving the reading systems ;)

It could also be a way of ensuring EPUB dies an early death.

Peter Sorotokin
03-19-2010, 02:36 AM
If at any time in the next 50 years it so happens that someone produces a widely used EPUB reader that fails to render a document because of a name attribute that is where it shouldn't be, I will print out this entire thread, and eat it.

Kovid, don't make statements like that, please. I can hear devil wispering in my ear: "only a small Easter egg... and such fun for everyone!"

kovidgoyal
03-19-2010, 03:35 AM
Kovid, don't make statements like that, please. I can hear devil wispering in my ear: "only a small Easter egg... and such fun for everyone!"

Ah an evil developer, now there's a prospect to give my stomach the cramps.

ATimson
03-19-2010, 03:46 AM
Ah an evil developer, now there's a prospect to give my stomach the cramps.
There are non-evil developers (asks a developer)? :blink:

Chang
06-02-2010, 03:52 AM
Maybe this goes under this thread..

I was wondering why my ePubs pass the epubcheck 1.0.5 validation but with some books ADE gives a warning "The document appears to have minor errors that might cause it to be displayed incorrectly". This doesn't even happen every time with same ePub, only sometimes.

Does someone know what's the logic behind ADE's warning because afaik it uses epubcheck validator also or am I totally wrong?

pdurrant
06-02-2010, 05:49 AM
Maybe this goes under this thread..

I was wondering why my ePubs pass the epubcheck 1.0.5 validation but with some books ADE gives a warning "The document appears to have minor errors that might cause it to be displayed incorrectly". This doesn't even happen every time with same ePub, only sometimes.

Does someone know what's the logic behind ADE's warning because afaik it uses epubcheck validator also or am I totally wrong?

ADE does it's own validation in addition or possibly instead of using epubcheck.

Hamlet53
06-02-2010, 05:10 PM
If at any time in the next 50 years it so happens that someone produces a widely used EPUB reader that fails to render a document because of a name attribute that is where it shouldn't be, I will print out this entire thread, and eat it.

Reminds me of the days when I worked for a company producing online courses and was doing all the html/JavaScript/ASP/DB coding and always went through hell trying to produce a product that would work with multiple versions of NN and IE.

Anyway I do try and run my files through epubcheck before uploading here and this seems to be a good place to ask a couple of questions that have been annoying me.

First I am running under Windows XP and would like to have the epubcheck output write to a file so that I have the written output and also can use a BAT file to run it instead of typing the whole command at a command prompt each time. The trouble is that a BAT file of:

java -jar epubcheck-1.0.5.jar colomba.epub > verifyout.txt

produces the file verifyout.txt with content consisting only of:

Epubcheck Version 1.0.5


Whereas the actual output should continue:

ERROR: colomba.epub: length of first filename in archive must be 8, but was 9

Check finished with warnings or errors!

I always get this error, but have yet to figure out what it means.

pdurrant
06-02-2010, 05:34 PM
First I am running under Windows XP and would like to have the epubcheck output write to a file so that I have the written output and also can use a BAT file to run it instead of typing the whole command at a command prompt each time. The trouble is that a BAT file of:

java -jar epubcheck-1.0.5.jar colomba.epub > verifyout.txt

produces the file verifyout.txt with content consisting only of:

Epubcheck Version 1.0.5


Whereas the actual output should continue:

ERROR: colomba.epub: length of first filename in archive must be 8, but was 9

Check finished with warnings or errors!

I always get this error, but have yet to figure out what it means.

I think that epubcheck writes error reports out to stderr rather than stdout, which would explain why you're only seeing the version statement.

The error means just what it says. The first file in the epub must be called mimetype, must not be compressed and must contain just the characters application/epub+zip.

Either you've mis-named the mimetype file, or you're not zipping your epub up correctly.

DaleDe
06-02-2010, 08:24 PM
I think that epubcheck writes error reports out to stderr rather than stdout, which would explain why you're only seeing the version statement.

The error means just what it says. The first file in the epub must be called mimetype, must not be compressed and must contain just the characters application/epub+zip.

Either you've mis-named the mimetype file, or you're not zipping your epub up correctly.

Actually the error is cryptic since it would have been much more instructive to have said the first file must be named mimetype. If it is only checking the file name length then it is doing the user a disservice in passing a possibly bad archive that happens to have a filename that is 8 characters long but not mimetype.

Dale

dmapr
06-02-2010, 08:32 PM
I think that epubcheck writes error reports out to stderr rather than stdout, which would explain why you're only seeing the version statement.

That's right. Fortunately, the remedy is simple. Instead of
java -jar epubcheck-1.0.5.jar colomba.epub > verifyout.txt one needs to say
java -jar epubcheck-1.0.5.jar colomba.epub > verifyout.txt 2>&1

Hamlet53
06-02-2010, 09:14 PM
My thanks to all three of you that have responded to my query so far.

Dmapr: Yes, once I knew what the output was being directed to I found that solution. So that problem solved.

The file name I am using, mimetype, and its content, application/epub+zip, are correct so it must have something to do with how I am doing the compression. I have always used 7-Zip File Manager according to the following steps:

add mimetype to create the archive using Compression Level – Store

add OEBPS folder (containing all book content files) to archive using Compression Level – Normal

add META-INF folder (holds container.xml) to archive using Compression Level – Normal

Change file name to epub extension (from zip)

This has always produced an epub file readable as far as I am aware by all devices; personally verified Sony 900, Sony 505, Calibre, Firefox epub plugin, and Adobe reader.
This is why I have never worried over that message before, but if there is another suggestion for how to do the compression to produce a completely 'legit' file I have open eyes. :)

pdurrant
06-03-2010, 04:11 AM
This has always produced an epub file readable as far as I am aware by all devices; personally verified Sony 900, Sony 505, Calibre, Firefox epub plugin, and Adobe reader.
This is why I have never worried over that message before, but if there is another suggestion for how to do the compression to produce a completely 'legit' file I have open eyes. :)

Look at your epub with a hex editor. If bytes 0x1E to 0x39 are not mimetypeapplication/epub+zip then it's not a valid ePub file.

DaleDe
06-03-2010, 03:11 PM
Look at your epub with a hex editor. If bytes 0x1E to 0x39 are not mimetypeapplication/epub+zip then it's not a valid ePub file.

Even with something like notepad the text above will show up clear on the first line. That is why it is first and not compressed.

Dale

Hamlet53
06-03-2010, 03:14 PM
So while I would have loved to be a writer, that was not my Muse. I became a Chemical Engineer instead and had to make do with no Muse at all. I did learn the scientific experimental method though and thought I would apply it.

Someone :) recently uploaded an illustrated version of Kim by Kipling here that seemed likely to pass [B]epubcheck.

Step 1. Download that epub file an run epubcheck on it. As expected no errors or warnings.

Step 2. Decompress this file using 7-Zip and then compress that back into an epub file using the procedure described in my previous post.

Step 3. Run epubcheck on the file from step 2. Result:

ERROR: Kim2.epub: length of first filename in archive must be 8, but was 9

Check finished with warnings or errors!


So that's interesting. How do others do the compression, with what program?

Note changed file names to Kim.epub & Kim2.epub

DaleDe
06-03-2010, 03:18 PM
I use info zip from a command line or just let windows do it using drag and drop. I use 7zip for other things but not this.

JSWolf
06-03-2010, 03:25 PM
I use WinRAR.

dmapr
06-03-2010, 04:19 PM
I use WinRAR.

+1. Same here.

Chang
06-08-2010, 02:36 AM
Maybe this goes under this thread..

I was wondering why my ePubs pass the epubcheck 1.0.5 validation but with some books ADE gives a warning "The document appears to have minor errors that might cause it to be displayed incorrectly". This doesn't even happen every time with same ePub, only sometimes.

Does someone know what's the logic behind ADE's warning because afaik it uses epubcheck validator also or am I totally wrong?

ADE does it's own validation in addition or possibly instead of using epubcheck.

Would be nice to know does ADE know something more than epubcheck. Should I be worried about the warning message?

ShawnM
08-09-2011, 01:06 PM
Just so you know: epubcheck is totally meaningless. A file that *passes* epubcheck may or may not work with any given epub renderer. A file that *does not pass* epubcheck may or may not work with any given epub renderer.


I agree with you.

This is what I find strange...I got two different results running two different checkers. These were epub books that I produced for an author.

i) I ran ePub check (version 1.2) and it said there were some errors.
ii) I used Atlantis' epub checker and it passed with no errors.

Does epub check have some flaws I should know about?

The books render fine on every ebook device, so I know there are no "known issues", or formatting errors, that I need to worry about.

ATimson
08-09-2011, 01:11 PM
The only known issues that I'm aware of involve OEPBS 1.2 syntax OPF files not validating because it can't find the unique-identifier. But you can look at the official bug list (http://code.google.com/p/epubcheck/issues/list) yourself.

Jellby
08-09-2011, 01:17 PM
If epubcheck says it has some errors, the file is most probably not valid. That doesn't mean it won't work in any reader, it might, or it might not, but the file doest not comply with the specs.

Toxaris
08-09-2011, 03:24 PM
I rely on FlightCrew only...

wannabee
08-10-2011, 02:28 AM
I use both so when one gives an error in martian maybe the other will give you a clue of what the heck it was talking about.

charleski
08-10-2011, 06:58 PM
First I am running under Windows XP and would like to have the epubcheck output write to a file so that I have the written output and also can use a BAT file to run it instead of typing the whole command at a command prompt each time. The trouble is that a BAT file of:

java -jar epubcheck-1.0.5.jar colomba.epub > verifyout.txt

produces the file verifyout.txt with content consisting only of:

Epubcheck Version 1.0.5

Use
java -jar epubcheck-1.0.5.jar colomba.epub 2> verifyout.txt
to get the stderr text.

JSWolf
08-10-2011, 08:27 PM
1.0.5 is an old version. Give 1.2 a go.

ShawnM
08-15-2011, 02:03 PM
If epubcheck says it has some errors, the file is most probably not valid. That doesn't mean it won't work in any reader, it might, or it might not, but the file doest not comply with the specs.

I'm still a newb with the validation process, so I won't claim to know too much.

Why would epubcheck (from Google Project Hosting) indicate an error, while another checker gives no error?

The ebooks I produced work perfectly fine on the reading devices with no viewing errors.

Also, does anyone know which check service the major ebook retailers use? Or do they each have their own validation tool?

DaleDe
08-15-2011, 02:14 PM
I'm still a newb with the validation process, so I won't claim to know too much.

Why would epubcheck (from Google Project Hosting) indicate an error, while another checker gives no error?

The ebooks I produced work perfectly fine on the reading devices with no viewing errors.

Also, does anyone know which check service the major ebook retailers use? Or do they each have their own validation tool?

Did you try the most recent version of epubcheck? The earlier versions were buggy.

All checkers check against specs, they won't tell you if the eBook can be viewed correctly. How many devices did you try viewing it on? It may view on one correctly and fail on another. Even if it passes epubcheck it may still not view correctly due to reasons not related to the spec.

yes, many retailers have their own checkers and epubcheck isn't particularly a good one these days. Checkers are good but you do need to use the latest version for good results.

Dale

ShawnM
08-16-2011, 01:47 PM
Did you try the most recent version of epubcheck? The earlier versions were buggy.
Dale

I used v1.2 which seems to be the most recent in the downloads section here...
http://code.google.com/p/epubcheck/downloads/list

Are there any other epub checkers available that might be more reliable?

pdurrant
08-16-2011, 02:00 PM
I used v1.2 which seems to be the most recent in the downloads section here...
http://code.google.com/p/epubcheck/downloads/list

Are there any other epub checkers available that might be more reliable?

Why do you think epubcheck 1.2 is unreliable?

If it gives you errors, there are almost certainly errors in your ePub. That it renders OK in some ePub reader means nothing.

st_albert
08-16-2011, 02:19 PM
Are there any other epub checkers available that might be more reliable?

Try Flight Crew (http://code.google.com/p/flightcrew/).

If nothing else, its error messages are a little more understandable.

ShawnM
08-16-2011, 02:38 PM
Try Flight Crew (http://code.google.com/p/flightcrew/).

If nothing else, its error messages are a little more understandable.

Thanks.

Actually understanding the error would be useful. The epubcheck that I recently used was a complete mess.