Quote:
Originally Posted by theducks
I would love at see the REGEX (or any embeded tool) the can deal with matching up closing tags. (My REGEX foo is basic)
The example only works for the case shown
I have seen (IMHO Word Processor? garbage)
Code:
<p>
<span class="normal"><span>A paragraph with</span></span> <span class="italic">italics</span> <span class="normal"><span>in it.</span></span>
</p>
If you run the cleanup above, you end with broken first and last spans
|
Yes, the point is that while regex *can* be useful to do very basic things with html, it is fundamentally incapable of "parsing" it in a reliable manner, which you'll see mentioned all over the internet if you google it. XML parsers, on the other hand, can do it properly. Cleaning up the mess left behind by epub generating tools and poor html understanding is the main purpose of my idea, and using the XML library included in Sigil is one way to approach it.
Edit: I see that the
regex store already delivered on this one as well. Maybe we could just drop implementing the features, and instead have Sigil submitting the files to DiapDealer for regex'ing?