![]() |
#1 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
Using Regex to find and remove unwanted mediawiki links
I've tried reading the documentation about Regex, and it's all gibberish to me. I have no idea what each symbol means or does and the guides I've tried to read approaches the subject as if I'm familiar with similar languages. I just need to know what I can type, and what each things means/represents to remove links like <a href="13th_Black_Crusade" title="wikilink"> with nothing, or a space if needed. I tried reading the Regex sticky on these forums and it's just {[{(^&*^%&^*^*(} to me. Any help would be appreciated.
|
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
If the string to remove is this, and lets assume there is a closing tag as well
PHP Code:
PHP Code:
But if there is something between the opening and closing tag that you need to save, you would use... PHP Code:
PHP Code:
|
![]() |
![]() |
![]() |
#3 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,353
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
I had to eat that elephant one bite at a time....and I'm nowhere close to being as good as some of the reg-fu masters around here... but I can get most of what I want done with a few of the basics like Karellen mentioned.
This website has a basic description that can get you started understanding what regex is about https://www.developer.com/languages/...ssions-primer/ but that website's examples are used more for programmers. Once you have a basic understanding then you can use this website to learn the specific flavor of regex that Sigil uses (PCRE): https://www.regular-expressions.info/pcre.html And here is a basic memory aid that helps until you just memorize them from using them all the time: http://marvin.cs.uidaho.edu/~heckend...uts/regex.html Once you have a basic idea of what is involved, then the regex sticky is where people go to share (or ask) how to do some specific things. |
![]() |
![]() |
![]() |
#4 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Replace: <a> - - - In Plain English, what does this regex do? There's 3 key parts of the Search: 1. <a href="
2. [^"]+
3. " title="wikilink">
Replace with:
- - - Before: Code:
<a href="13th_Black_Crusade" title="wikilink"> <a href="14th_Black_Crusade" title="wikilink"> <a href="15th_Black_Crusade" title="wikilink"> <a href="BlahBlahBlah" title="wikilink"> Code:
<a> <a> <a> <a> Quote:
I did step-by-step breakdowns on some Regex, plus I linked to a ton of my previous posts about the subject. Many of my MobileRead topics even have color-coded regex, so you can see which piece does what. ![]() Last edited by Tex2002ans; 12-20-2022 at 06:54 AM. |
||
![]() |
![]() |
![]() |
#5 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,206
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
![]() |
![]() |
![]() |
#6 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
![]() - - - Side Note: But, in this very specific case, I always just trusted Sigil:
to auto-remove those dangling/orphaned </a>: Code:
disappeared link</a> Another time I use that is when I'm fixing up TOCs! I'm just too lazy to correct all the nested mess, so I:
and am left with a very clean TOC! One of these days, I should update those Saved Searches, but they've been serving me so well for 10+ years!!! ![]() Last edited by Tex2002ans; 12-20-2022 at 07:52 AM. |
||
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
So, that will remove the whole link? <a href="words in here" title"=wikilink><a>?
Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I would use TagMechanic to remove the <a... from the start.
|
![]() |
![]() |
![]() |
#9 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Think something like this. You are trying to get rid of the first RED <a>: Code:
<a href="Extra"></a><a href="Correct">Clickable Link</a>
Code:
</a><a href="Correct">Clickable Link
This is why it's sometimes easier to do things in stages, instead of all-in-one swoop. ![]() - - - - - This is where DiapDealer's Sigil plugin would help: That makes sure to match every single open <a> with its matching closing </a>. - - - - - You would take your original code: Code:
<a href="BlahBlahBlah" title="wikilink">A link we don't want.</a> <a href="BlahBlahBlah2" title="wikilink">A link we don't want.</a> <a href="3rd-Example" title="wikilink">A link we want.</a> Code:
<a>A link we don't want.</a> <a>A link we don't want.</a> <a href="3rd-Example" title="wikilink">A link we want.</a>
and it would find all the blank <a>s—with nothing in them—and delete them: Code:
A link we don't want. A link we don't want. <a href="3rd-Example" title="wikilink">A link we want.</a> Side Note: I wrote a few TagMechanic tutorials/tips over the years: It's very helpful for cleaning up code like this. ![]() Want to get rid of all the:
No problem! Want to convert:
No problem! Last edited by Tex2002ans; 12-20-2022 at 04:22 PM. |
|
![]() |
![]() |
![]() |
#10 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
![]() |
![]() |
![]() |
#11 | |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
You turned a multiday job into a 30 second job. If even 30 seconds. Thank you so much for that.
Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
![]() ![]() |
![]() |
![]() |
![]() |
#13 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
Happy that it was helpful for you ![]() Quote:
Some of the regex strings that have been created in our Kodi project is just mind-boggling. How one of our developers had the time, patience and knowledge to put this together is amazing. And this is only one of about 8 scrapers. Although they are no longer in use as we have moved onto Python scrapers. https://github.com/xbmc/repo-scraper...b.org/tmdb.xml |
||
![]() |
![]() |
![]() |
#14 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
TagMachanic would have been a once job to get rid of the links in one shot.
|
![]() |
![]() |
![]() |
#15 |
Not Quite Dead
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
More and more, after puzzling with regex, I find myself taking CSS shortcuts that deal with the problem in a non-destructive way.
In the case of the OP, the links could be made to disappear using an attribute selector: a[title="wikilink"] {visibility: hidden;} |
![]() |
![]() |
![]() |
Tags |
find and replace, reg expressions |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do I remove unwanted highlighting? | bizzybody | Calibre | 2 | 01-09-2020 10:45 AM |
Links as unwanted footnotes | jemandy | Editor | 1 | 07-21-2018 09:51 PM |
Removing unwanted RegEx's | dc696969 | Library Management | 1 | 03-27-2013 04:08 AM |
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil | Contre-jour | Sigil | 9 | 02-01-2013 10:47 AM |
Create MediaWiki and RTF links for opening an ePub file with Calibre viewer | johnsidi | Calibre | 1 | 12-17-2011 01:31 PM |