Thread: Regex examples
View Single Post
Old 12-04-2025, 08:42 AM   #812
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 686
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
I'm removing many <a href ... /a> around text (in many files). Most of the time it's fine, sometimes it catches too much.

Example:
<a href="../../f70d_0040.smil#rgn_txt_0040_0006" class="pcalibre2 calibre4 pcalibre1 pcalibre">text</a>
should become
text

I use:

Search:
<a href=".*?(?<=smil).*?">(.*?)</a>
Replace:
\1

But, like in this case, it catches all this (instead of only the second href with ".smil" inside (I want to keep the note)).

<a href="v014884_split_068.html#note-d2e8351" class="pcalibre noteref pcalibre1 pcalibre2" id="id_3738">88</a> <span id="id_3739"> <a href="../../d202_100.smil#txt_2555" class="pcalibre calibre4 pcalibre1 pcalibre2">text</a>


I've been playing around but can't seem to find a reliable way to only get the <a href ... /> which contains ".smil".

Any ideas from you gurus?
patrik is offline   Reply With Quote