View Single Post
Old 08-03-2019, 01:38 AM   #23
odamizu
just an egg
odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.
 
odamizu's Avatar
 
Posts: 1,841
Karma: 8006346
Join Date: Mar 2015
Device: Kindle, iOS
I have an atrocious Word Doc that resulted from scanning a paper book, and this thread encouraged me to finally attempt to convert it into an epub.

The DocXImport plugin was perfect (thank you DiapDealer!), but I also tried saving the Word Doc as filtered html, which resulted in a lot of crazy in-line styles. I figured I could do a nuclear search/replace to clean it up, but my regex skills are weak.

<p.*?> picked up this:

Code:
<p class=MsoNormal style='margin-left:.15in;line-height:13.1pt;background:white'>
but missed this:

Code:
<p class=MsoNormal style='margin-top:26.3pt;margin-right:.25pt;margin-bottom:
0in;margin-left:1.7pt;margin-bottom:.0001pt;text-align:justify;text-justify:
inter-ideograph;text-indent:11.15pt;line-height:12.95pt;background:white'>
As I said, the DocXImport solved my problem by blasting away all the crazy styling, but I would really like to learn why my regex failed and what would work instead.

Help, please, thank you!
odamizu is offline   Reply With Quote