MobileRead Forums - View Single Post

foosion · 10-07-2024, 10:37 AM

Quote:

Originally Posted by DiapDealer

Using regex alone to do this sort of work is certainly anyone's prerogative (and I'm not hyping my own plugins for more usage). Just know that regex alone is more prone to break things -- especially where nested tags are concerned. Hence the reason I created my plugins in the first place. It uses an a html parser to eliminate the possibility of breaking nested situations. Whereas regex alone will happily crash through nested spans and divs like a bull in a china shop--with no concern with whether or not it's breaking anything.

That is an issue and one has to be careful. Nested tags will result in, for example, <div>(.*?)</div> matching the first closing </div> it encounters rather than the matching </div>. It's easier if you're eliminating all of the divs (or spans) in a file.

There may be a complicated regex that avoids the problem.

BTW, if I wanted to eliminate all <section epub:type="bodymatter chapter"> or the like, how would I set the plugin? I could use regex to add a, for example, id tag, then match the id with the plugin, but that would seem to defeat the purpose.