View Full Version : Cleaner Word HTML

03-18-2011, 03:59 AM
Has anyone seen this article ( It describes using xml stylesheets to make Word export much cleaner xhtml code. It tried it on a few files and it does seems to work rather nice.

Basically what you do, is save the document as an xml file, but apply a xml stylesheet while saving. The result has the extension xml, but is in fact a much cleaner html file.

03-18-2011, 08:28 AM
Ohhh... that looks like excellent info! Thanks for posting that link, Toxaris. I need to brush up on my very rusty XSL and play around with this. I kept having vague thoughts re the fact that the native .docx format used by MS Word 2007 and later is actually a compilation of XML data might offer the possibility of creating an XSL transformation to translate it into clean XHTML. But so far have never quite got around to investigating that. This provides an excellent starting point!


03-18-2011, 06:04 PM
Best part is, some example stylesheets are given! So you can try right away...