MobileRead Forums - View Single Post

odamizu · 08-03-2019, 01:38 AM

I have an atrocious Word Doc that resulted from scanning a paper book, and this thread encouraged me to finally attempt to convert it into an epub.

The DocXImport plugin was perfect (thank you DiapDealer!), but I also tried saving the Word Doc as filtered html, which resulted in a lot of crazy in-line styles. I figured I could do a nuclear search/replace to clean it up, but my regex skills are weak.

<p.*?> picked up this:

Code:

<p class=MsoNormal style='margin-left:.15in;line-height:13.1pt;background:white'>

but missed this:

Code:

<p class=MsoNormal style='margin-top:26.3pt;margin-right:.25pt;margin-bottom:
0in;margin-left:1.7pt;margin-bottom:.0001pt;text-align:justify;text-justify:
inter-ideograph;text-indent:11.15pt;line-height:12.95pt;background:white'>

As I said, the DocXImport solved my problem by blasting away all the crazy styling, but I would really like to learn why my regex failed and what would work instead.

Help, please, thank you!

08-03-2019, 01:38 AM	#23
odamizu just an egg Posts: 1,841 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS	I have an atrocious Word Doc that resulted from scanning a paper book, and this thread encouraged me to finally attempt to convert it into an epub. The DocXImport plugin was perfect (thank you DiapDealer!), but I also tried saving the Word Doc as filtered html, which resulted in a lot of crazy in-line styles. I figured I could do a nuclear search/replace to clean it up, but my regex skills are weak. <p.*?> picked up this: Code: <p class=MsoNormal style='margin-left:.15in;line-height:13.1pt;background:white'> but missed this: Code: <p class=MsoNormal style='margin-top:26.3pt;margin-right:.25pt;margin-bottom: 0in;margin-left:1.7pt;margin-bottom:.0001pt;text-align:justify;text-justify: inter-ideograph;text-indent:11.15pt;line-height:12.95pt;background:white'> As I said, the DocXImport solved my problem by blasting away all the crazy styling, but I would really like to learn why my regex failed and what would work instead. Help, please, thank you!