Quote:
Originally Posted by skreutzer
Oh, Microsoft Word has fooled you ;-) The term "XML export" is technically nonsense, because XML isn't a format in itself, but a way to define all kinds of formats. So there is no "XML format" per se, one would have to ask "which XML format?" (because there are lots of them, XHTML included). So what Word calls ".xml", is their "Word XML format", and yes, of course, such a thing is useless, if they're not even capable of outputting valid XHTML in the first place. I do not talk about stupid custom XML formats, but of reasonable ones.
|
Well, not quite. XML is meaningless. It is only a markup language and add structure. I can export whatever I want as XML, as long as I honor the structure. Without the schema however, the XML is useless. In the schema we define what the tags mean and how the structure should look like. XHTML is not a format, it is just XML with a (more or less) strictly defined schema.
Word XML is just that. It is perfectly valid XML with a schema specifically for Word documents, just as the intention was. In principle it is possible to load the XML in Word and have your original document. The same applies for their HTML output. It is valid, even if it is not what we would like.
All XML 'formats' are custom, but some schemas are public and agreed upon by various parties.
That is also one of the issues. A schema needs to be agreed to correctly identify the semantic value of the tags. You cannot expect all (or any) wordprocessor to honor the schema you would like. So, you would need to map the XML schema from the wordprocessor to your schema. That will not always be possible.
You also greatly overestimate the willingness of writers to change their ways and their reaction to being forced to work in a certain way. They would rather use another program or even Wordpad than to change their wow. Only a small amount of writers is willing to do that.
You might take a look at my Word add-in. I create clean HTML output (or XHTML directly in an ePUB) out of Word, but at a price. Styling like margins and fonts will be removed. It would be relatively easy to create an export for another format (e.g. Markdown) in the same way.
I like the idea, but I think you are too optimistic. However, if I can help to improve things, I probably will.