View Single Post
Old 02-09-2009, 06:22 PM   #46
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by Komenor View Post
Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...
You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.

Let me elaborate...

Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.

OK, so the application accepts an HTML file. Is the file valid HTML? You can't make that a precondition. I'm sorry, you just can't. Most HTML out there is nowhere near being valid, and the user could need to import HTML he didn't write himself.

So the app needs to accept invalid HTML, that is, HTML that display OK on a modern browser but that does not follow the required standards. And with that, you just blew any possibility of having a guarantee that the epub you export will always be standards compliant.

Why?

Well, you can't design a useful algorithm that accepts invalid HTML and outputs valid HTML. A useful algorithm would have these requirements, for any input:

1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.

The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.

Now, you could design an algorithm that fulfills both requirements for some input, but not for all. And no, not even Tidy can give you that, because it is theoretically impossible.

So you're stuck now. You can't guarantee your users that you will always output a valid epub file no matter what they import. You can do your best (and you should), but in the end... The second requirement is much more important than the first one. So you fix what you can and possibly tell the user about what you can't.

If they really care about producing a valid epub file, they will have to fix the errors your app can't fix themselves. And so you make it easy for them and give them access to the source. And if they introduce any errors whilst editing the source, it's their fault. They will probably have to fix it by editing the source, too.

Now if you wanted an editor that could only create epub files from scratch, then you could guarantee standard compliance if you disallow direct source code editing. But you don't want to make that kind of editor.

Your output can only be as good as the input (maybe slightly better, for trivial errors in the original file). The editor can't turn shit into gold, and can not give guarantees about compliance. Any that does is flat-out lying.

Last edited by Valloric; 02-09-2009 at 06:26 PM.
Valloric is offline   Reply With Quote