Thread: Regex examples
View Single Post
Old 01-09-2013, 09:08 AM   #165
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
found myself parsing messy html today, removing empty <p> tags, or <p> tags containing &nbsp;, or <p><i></i></p>, <p><b> </b><p> etc. so that i could space the paragraphs consistently in css, and, inspired by this thread, thought i'd share the snippet in case anyone has a use for it.

i realize it could probably be more concise, and i wouldn't just blindly replace all, but it seems to do the job. it removes <p> tags that may also contain <b>, <i>, <span>, have no content, or 1 or more spaces, or a <br>,<br/>,<br />.

Code:
<p[^>]*>((<\w+[^>/]*>)+)?(<br((\s)?/)?>|&nbsp;|\s*)((</\w+[^>]*>)+)?</p>

Last edited by mzmm; 01-09-2013 at 09:15 AM.
mzmm is offline   Reply With Quote