View Single Post
Old 02-24-2013, 09:52 AM   #1
Dybbuk
Junior Member
Dybbuk began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2013
Device: Iphone 4
Removing Everything But Formatted Text

I've been looking for a way on Sigil to delete everything in an epub but the stuff between <p> tags. In other words, to remove everything in a file but <p.*/p>.

It's easy to remove all the non p-tags with a regex - and wind up with plain text - but I'm stumped about how to remove all the non p-tags except the ones within paragraphs (such as <span>, <em>, etc.). I've Googled around and the consensus seems to be that regex is useless for parsing nested HTML tags. Is that really true?
Dybbuk is offline   Reply With Quote