Thread: Regex examples
View Single Post
Old 01-09-2013, 09:08 AM   #165
mzmm
Zealot
mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.mzmm calls his or her ebook reader Vera.
 
mzmm's Avatar
 
Posts: 149
Karma: 64872
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
found myself parsing messy html today, removing empty <p> tags, or <p> tags containing &nbsp;, or <p><i></i></p>, <p><b> </b><p> etc. so that i could space the paragraphs consistently in css, and, inspired by this thread, thought i'd share the snippet in case anyone has a use for it.

i realize it could probably be more concise, and i wouldn't just blindly replace all, but it seems to do the job. it removes <p> tags that may also contain <b>, <i>, <span>, have no content, or 1 or more spaces, or a <br>,<br/>,<br />.

Code:
<p[^>]*>((<\w+[^>/]*>)+)?(<br((\s)?/)?>|&nbsp;|\s*)((</\w+[^>]*>)+)?</p>

Last edited by mzmm; 01-09-2013 at 09:15 AM.
mzmm is offline   Reply With Quote