MobileRead Forums - View Single Post - Proposal: CSS "normalisation" functionality in Sigil

DiapDealer · 08-14-2013, 08:32 AM

Quote:

Originally Posted by Man Eating Duck

* Apart from regex being complicated to grasp for casual users, it is also theoretically impossible to reliably parse html with regex. I won't go into much detail, but a trivial example:

A paragraph with italics in it.

I've actually seen this very structure in the wild, with a corresponding .empty{}. If you want to remove the useless "empty" spans, an intuitive regex might be something like (?U)(.*), replace with /1. In the example above this would extend the italic span to encompass the rest of the paragraph.

Which is why you would include the closing in the match to make sure you only got the all encompassing span.

Code:

(?U)<span class="empty">(.*)</span>\s+</p>

Replace with: \1\n

I'm not arguing that a true parser wouldn't do a more effective (safer) job. It would. I just don't think it would be a very simple task to provide an end user with a configurable, flexible interface to the parser in order to inform it of their desires (without actually writing code themselves).