View Single Post
Old 08-14-2013, 07:32 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,648
Karma: 205022288
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Man Eating Duck View Post
* Apart from regex being complicated to grasp for casual users, it is also theoretically impossible to reliably parse html with regex. I won't go into much detail, but a trivial example:
<p>
<span class="empty">A paragraph with <span class="italic">italics</span> in it.</span>
</p>
I've actually seen this very structure in the wild, with a corresponding .empty{}. If you want to remove the useless "empty" spans, an intuitive regex might be something like (?U)<span class="empty">(.*)</span>, replace with /1. In the example above this would extend the italic span to encompass the rest of the paragraph.
Which is why you would include the closing </p> in the match to make sure you only got the all encompassing span.

Code:
(?U)<span class="empty">(.*)</span>\s+</p>
Replace with: \1\n</p>

I'm not arguing that a true parser wouldn't do a more effective (safer) job. It would. I just don't think it would be a very simple task to provide an end user with a configurable, flexible interface to the parser in order to inform it of their desires (without actually writing code themselves).
DiapDealer is offline   Reply With Quote