Quote:
Originally Posted by eschwartz
XML on its own is the single most useless form of data in the known universe, since unlike (X)HTML it has no schema to interpret it
|
There is a scheme which defines a file format. As soon as you know what book part is being represented by the XML element, you can apply any formatting on your own, not being in complete dependance on faithful representation of the HTML author's CSS/physical format.
I have yet to see EPUB which have no semantically absurd elements like '<p> </p>'. This is physical formatting which can be easily ridden off with a specific semantic markup.
Consider that many libraries of any size use their own formats for their digitized funds, as EPUB cannot represent the document structure which is needed for digital processing. Many of this formats are exactly this: semantic XML-based formats.