View Single Post
Old 08-14-2013, 10:07 AM   #8
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by theducks View Post
I would love at see the REGEX (or any embeded tool) the can deal with matching up closing tags. (My REGEX foo is basic)

The example only works for the case shown

I have seen (IMHO Word Processor? garbage)
Code:
<p>
<span class="normal"><span>A paragraph with</span></span> <span class="italic">italics</span> <span class="normal"><span>in it.</span></span>
</p>



If you run the cleanup above, you end with broken first and last spans
Yes, the point is that while regex *can* be useful to do very basic things with html, it is fundamentally incapable of "parsing" it in a reliable manner, which you'll see mentioned all over the internet if you google it. XML parsers, on the other hand, can do it properly. Cleaning up the mess left behind by epub generating tools and poor html understanding is the main purpose of my idea, and using the XML library included in Sigil is one way to approach it.

Edit: I see that the regex store already delivered on this one as well. Maybe we could just drop implementing the features, and instead have Sigil submitting the files to DiapDealer for regex'ing?

Last edited by Man Eating Duck; 08-14-2013 at 10:11 AM.
Man Eating Duck is offline   Reply With Quote