View Single Post
Old 10-12-2024, 07:58 AM   #8
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 479
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by Biblos View Post
What I understand is that you want to remove the <div class=“whatever”> tags and their </div> tag (and only these 2 tags) with just one regex. Is that it?
Here's a regex that does it:
https://regex101.com/r/NDUBOz/1

With only one regex, it's complicated. The regex above uses the recursion of capture group 2 (?2)) to traverse the nested tags starting from the div tag with class "whatever". Group 3, which contains the recursion, is atomic (?>) to avoid catastrophic backtracking. The next step will be to beautiful again the files.

A more elegant solution would have been to write a regex-function: the regex selects the group of divs and passes the selection to a Python function that already has the necessary functions to match the </div>.
Very impressive! I didn't think it was possible.
foosion is offline   Reply With Quote