MobileRead Forums - View Single Post

eschwartz · 12-07-2015, 10:34 PM

Quote:

Originally Posted by Sarmat89

1. The important metadata are stored in EPUBs in plain text format, and some cannot be stored at all.
The solution: store the metadata in XML format with a defined set of fields: (Name;Surname;UUID;etc); (Series;#); (Genre;Sub-genre;Sub-sub-genre;etc).

Demonstrably false. Plaintext metadata has never existed in the EPUB spec, and if you try to use plaintext metadata I am not aware of a single piece of code in the world that will understand it.

This would be because the metadata is written in XML, just like you wanted.

Quote:

The metadata come from a list which is designed for books, unlike DC.

You know, I am fairly positive I mentioned this already... but if you have a problem with the DC metadata, why not complain to them, rather than trying to start a competing ebook format for the sole purpose of arguing with the DC over the list of available metadata fields?

Quote:

2. The underlying HTML/CSS code is not standardized, and allows partial implementation of its features. There is no way to predict what the user's device is going to support.
The solution: the exhaustive XML semantic format, in which each part of a book format has one and only one possible representation. Physical formatting (CSS) is separate from logical formatting (elements used).

HTML and CSS are very standardized.
And anyone who is gong to break HTML+CSS compatibility by partial non-implementation will also partially not implement your proposed schema.

But XML has no representation and no formatting and no concept of CSS, except through a linked schema. Sure, XML is impossible to misinterpret, but that is only because it doesn't define anything.

So we are back to your schema. Since XHTML is the current schema used by EPUB, and ereader makers went ahead and "partially implemented its features", they will presumably do the same to your proposed schema (notwithstanding that you still haven't actually proposed it.)

Quote:

The device is free to apply any needed transformations based on the semantic content to achieve the supported output. No longer books are set of paragraphs, divs and spans with no semantic role and opaque formatting. Example: in HTML, you cannot be sure what does the paragraph style "SJ8M" (left margin, top margin, no break, italics, 80% size) is intended to do, and you cannot adjust the presentation if you do not support CSS features from the above list. With XML, it is clear that it is 'epigraph/p', and you can use any methods you support to format it according to your internal stylesheet combined with book and user stylesheets.

Oh, wait.

Do you mean there should be NO schema, and every ereader needs to reimplement the entire parsing and rendering logic?

Now I'm getting really confused, because I literally cannot think of one reason why anyone would ever suggest such a thing, even taking into account your "disagreement" with EPUB.

Quote:

3. Using browsers to display the raw HTML means working around the renderers' quirks and compatibility issues.
The solution: XML is free of preconceptions about the physical format necessary in WWW.

True. XML has no defined look and therefore cannot have any preconceptions about how your book should look.

This is a bad thing, as your book has no look, only a folded XML tree...

Quote:

I believe it is evident why a custom XML format yields better results for 95% of fiction and non-fiction EBooks than EPUB.

XHTML is a custom XML format, and EPUB validates against XHTML.
Aside from that, XML on its own is the single most useless form of data in the known universe, since:

unlike (X)HTML it has no schema to interpret it
unlike markdown or text/plain it has no visual meaning

Thus, it has no meaning whatsoever, and is in fact the platonic ideal of "empty meaning" (or perhaps, meaning that fell into a black hole).

It is marginally possible to access data that is even more meaningless, by inspecting the contents of /dev/urandom