MobileRead Forums - View Single Post

Sarmat89 · 12-10-2015, 07:06 AM

Quote:

Originally Posted by eschwartz

XML on its own is the single most useless form of data in the known universe, since unlike (X)HTML it has no schema to interpret it

There is a scheme which defines a file format. As soon as you know what book part is being represented by the XML element, you can apply any formatting on your own, not being in complete dependance on faithful representation of the HTML author's CSS/physical format.

I have yet to see EPUB which have no semantically absurd elements like '<p>&nbsp</p>'. This is physical formatting which can be easily ridden off with a specific semantic markup.

Consider that many libraries of any size use their own formats for their digitized funds, as EPUB cannot represent the document structure which is needed for digital processing. Many of this formats are exactly this: semantic XML-based formats.