found myself parsing messy html today, removing empty <p> tags, or <p> tags containing , or <p><i></i></p>, <p><b> </b><p> etc. so that i could space the paragraphs consistently in css, and, inspired by
this thread, thought i'd share the snippet in case anyone has a use for it.
i realize it could probably be more concise, and i wouldn't just blindly replace all, but it seems to do the job. it removes <p> tags that may also contain <b>, <i>, <span>, have no content, or 1 or more spaces, or a <br>,<br/>,<br />.
Code:
<p[^>]*>((<\w+[^>/]*>)+)?(<br((\s)?/)?>| |\s*)((</\w+[^>]*>)+)?</p>