Removing Everything But Formatted Text
I've been looking for a way on Sigil to delete everything in an epub but the stuff between <p> tags. In other words, to remove everything in a file but <p.*/p>.
It's easy to remove all the non p-tags with a regex - and wind up with plain text - but I'm stumped about how to remove all the non p-tags except the ones within paragraphs (such as <span>, <em>, etc.). I've Googled around and the consensus seems to be that regex is useless for parsing nested HTML tags. Is that really true?
|