View Single Post
Old 06-08-2013, 03:54 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Do the following little experiments:

1) Unzip a docx file and open document.xml in a text editor, that should tell you whether the conversion is generating extra markup or not. Hint, the answer is it isn't. The HTML markup is an almost literal translation of the markup in the docx. Every <span> in the HTML (with a couple of exceptions) corresponds to a <w:t> in the docx markup.

2) Try converting this docx file using the docx input plugin: http://calibre-ebook.com/downloads/demos/demo.docx That will show you just how much formatting it throws away.

That said, optimizing the markup generated by the conversion is on my todo list. As I said, the current markup is an almost literal translation, there is scope for analyzing and optimizing the generated markup.
kovidgoyal is offline   Reply With Quote