I think the issue of what you get in the ePUB XHTML depends on 'what you do' in Word with cut & paste and editing - in that shot I posted I changed the ".
Not" to "…
not" in Word and did a conversion.
The fragmentation of the
not that you see in the ePUB XHTML reflects what's in the Word DOCX XML
ePUB XHTML
Code:
<i class="calibre1">n</i><span class="text1">ot</span>
DOCX XML
Code:
<w:r w:rsidR="00AB4F90" w:rsidRPr="00160E46">
<w:t>n</w:t>
</w:r>
<w:r w:rsidRPr="00160E46">
<w:t>ot.</w:t>
</w:r>
So my conclusion is that the <span class:"text1>blah blah</span> sequences stem directly from the XML that Word creates in its DOCX files. And that as one does more editing on the DOCX the XML becomes more disorderly. Which after conversion results in less than optimal XHTML - ie Garbage In Garbage Out.
One way of ensuring better consistency might be to paste plain ASCII text into the DOCX - you can achieve this via the Word Options->Advanced->Cut, copy and Paste settings. You'd then have to do all the font styling manually.
If the examples you posted originate from LIT it might be interesting to see the XHTML that a LIT to EPUB conversion creates.
BR