I have an atrocious Word Doc that resulted from scanning a paper book, and this thread encouraged me to finally attempt to convert it into an epub.
The DocXImport plugin was perfect (thank you DiapDealer!), but I also tried saving the Word Doc as filtered html, which resulted in a lot of crazy in-line styles. I figured I could do a nuclear search/replace to clean it up, but my regex skills are weak.
<p.*?> picked up this:
Code:
<p class=MsoNormal style='margin-left:.15in;line-height:13.1pt;background:white'>
but missed this:
Code:
<p class=MsoNormal style='margin-top:26.3pt;margin-right:.25pt;margin-bottom:
0in;margin-left:1.7pt;margin-bottom:.0001pt;text-align:justify;text-justify:
inter-ideograph;text-indent:11.15pt;line-height:12.95pt;background:white'>
As I said, the DocXImport solved my problem by blasting away all the crazy styling, but I would really like to learn why my regex failed and what would work instead.
Help, please, thank you!