View Single Post
Old 10-07-2024, 10:37 AM   #15
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by DiapDealer View Post
Using regex alone to do this sort of work is certainly anyone's prerogative (and I'm not hyping my own plugins for more usage). Just know that regex alone is more prone to break things -- especially where nested tags are concerned. Hence the reason I created my plugins in the first place. It uses an a html parser to eliminate the possibility of breaking nested situations. Whereas regex alone will happily crash through nested spans and divs like a bull in a china shop--with no concern with whether or not it's breaking anything.
That is an issue and one has to be careful. Nested tags will result in, for example, <div>(.*?)</div> matching the first closing </div> it encounters rather than the matching </div>. It's easier if you're eliminating all of the divs (or spans) in a file.

There may be a complicated regex that avoids the problem.

BTW, if I wanted to eliminate all <section epub:type="bodymatter chapter"> or the like, how would I set the plugin? I could use regex to add a, for example, id tag, then match the id with the plugin, but that would seem to defeat the purpose.
foosion is offline   Reply With Quote