Originally Posted by Dybbuk
In that case, why does Sigil - whose whole purpose is editing HTML - use regex? (Not complaining, Sigil is awesome. Just curious.)
Because there's a difference between editing and parsing. And the Find & Replace feature in most editing software really has nothing to do with parsing code. F&R doesn't even know what "code" is. It's simply searching text for patterns you specify. Regex just happens to be one of the most flexible/powerful and common ways to achieve this.
What you want to do goes beyond the normal definition of editing or even Searching & Replacing. You're looking for something that automatically transforms
code into new code—new code whose conventions you
want to be able to specify (and preferably with no data-loss). That's a whole different ball o' wax.. and not something that's easily incorporated into a program (not without hard-coding the transformation rules, anyway; which would seriously limit the feature's usefulness to an end-user).
I don't want to strip the style. I want to keep the style and formatting, and remove everything outside of the paragraphs, such as <div>, <script>, etc
What about headers? Blockquotes? There's all kinds of situations that can arise in ePubs where text you definitely don't want to lose occurs outside of the <p> tags.