View Single Post
Old 10-12-2024, 12:07 AM   #2
Biblos
Junior Member
Biblos began at the beginning.
 
Biblos's Avatar
 
Posts: 2
Karma: 10
Join Date: Apr 2024
Device: Paperwhite
What I understand is that you want to remove the <div class=“whatever”> tags and their </div> tag (and only these 2 tags) with just one regex. Is that it?
Here's a regex that does it:
https://regex101.com/r/NDUBOz/1

With only one regex, it's complicated. The regex above uses the recursion of capture group 2 (?2)) to traverse the nested tags starting from the div tag with class "whatever". Group 3, which contains the recursion, is atomic (?>) to avoid catastrophic backtracking. The next step will be to beautiful again the files.

A more elegant solution would have been to write a regex-function: the regex selects the group of divs and passes the selection to a Python function that already has the necessary functions to match the </div>.
Biblos is offline   Reply With Quote