12-19-2022, 10:27 PM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
Using Regex to find and remove unwanted mediawiki links
I've tried reading the documentation about Regex, and it's all gibberish to me. I have no idea what each symbol means or does and the guides I've tried to read approaches the subject as if I'm familiar with similar languages. I just need to know what I can type, and what each things means/represents to remove links like <a href="13th_Black_Crusade" title="wikilink"> with nothing, or a space if needed. I tried reading the Regex sticky on these forums and it's just {[{(^&*^%&^*^*(} to me. Any help would be appreciated.
|
12-19-2022, 10:49 PM | #2 |
Wizard
Posts: 1,094
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
If the string to remove is this, and lets assume there is a closing tag as well
PHP Code:
PHP Code:
But if there is something between the opening and closing tag that you need to save, you would use... PHP Code:
PHP Code:
|
12-19-2022, 11:11 PM | #3 |
A Hairy Wizard
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
I had to eat that elephant one bite at a time....and I'm nowhere close to being as good as some of the reg-fu masters around here... but I can get most of what I want done with a few of the basics like Karellen mentioned.
This website has a basic description that can get you started understanding what regex is about https://www.developer.com/languages/...ssions-primer/ but that website's examples are used more for programmers. Once you have a basic understanding then you can use this website to learn the specific flavor of regex that Sigil uses (PCRE): https://www.regular-expressions.info/pcre.html And here is a basic memory aid that helps until you just memorize them from using them all the time: http://marvin.cs.uidaho.edu/~heckend...uts/regex.html Once you have a basic idea of what is involved, then the regex sticky is where people go to share (or ask) how to do some specific things. |
12-20-2022, 12:14 AM | #4 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Replace: <a> - - - In Plain English, what does this regex do? There's 3 key parts of the Search: 1. <a href="
2. [^"]+
3. " title="wikilink">
Replace with:
- - - Before: Code:
<a href="13th_Black_Crusade" title="wikilink"> <a href="14th_Black_Crusade" title="wikilink"> <a href="15th_Black_Crusade" title="wikilink"> <a href="BlahBlahBlah" title="wikilink"> Code:
<a> <a> <a> <a> Quote:
I did step-by-step breakdowns on some Regex, plus I linked to a ton of my previous posts about the subject. Many of my MobileRead topics even have color-coded regex, so you can see which piece does what. Last edited by Tex2002ans; 12-20-2022 at 06:54 AM. |
||
12-20-2022, 12:24 AM | #5 | |
Bibliophagist
Posts: 35,393
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
12-20-2022, 06:41 AM | #6 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
- - - Side Note: But, in this very specific case, I always just trusted Sigil:
to auto-remove those dangling/orphaned </a>: Code:
disappeared link</a> Another time I use that is when I'm fixing up TOCs! I'm just too lazy to correct all the nested mess, so I:
and am left with a very clean TOC! One of these days, I should update those Saved Searches, but they've been serving me so well for 10+ years!!! Last edited by Tex2002ans; 12-20-2022 at 07:52 AM. |
||
12-20-2022, 03:19 PM | #7 | |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
So, that will remove the whole link? <a href="words in here" title"=wikilink><a>?
Quote:
|
|
12-20-2022, 03:41 PM | #8 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I would use TagMechanic to remove the <a... from the start.
|
12-20-2022, 04:17 PM | #9 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Think something like this. You are trying to get rid of the first RED <a>: Code:
<a href="Extra"></a><a href="Correct">Clickable Link</a>
Code:
</a><a href="Correct">Clickable Link
This is why it's sometimes easier to do things in stages, instead of all-in-one swoop. - - - - - This is where DiapDealer's Sigil plugin would help: That makes sure to match every single open <a> with its matching closing </a>. - - - - - You would take your original code: Code:
<a href="BlahBlahBlah" title="wikilink">A link we don't want.</a> <a href="BlahBlahBlah2" title="wikilink">A link we don't want.</a> <a href="3rd-Example" title="wikilink">A link we want.</a> Code:
<a>A link we don't want.</a> <a>A link we don't want.</a> <a href="3rd-Example" title="wikilink">A link we want.</a>
and it would find all the blank <a>s—with nothing in them—and delete them: Code:
A link we don't want. A link we don't want. <a href="3rd-Example" title="wikilink">A link we want.</a> Side Note: I wrote a few TagMechanic tutorials/tips over the years: It's very helpful for cleaning up code like this. Want to get rid of all the:
No problem! Want to convert:
No problem! Last edited by Tex2002ans; 12-20-2022 at 04:22 PM. |
|
12-20-2022, 05:55 PM | #10 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
12-20-2022, 06:37 PM | #11 | |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
|
You turned a multiday job into a 30 second job. If even 30 seconds. Thank you so much for that.
Quote:
|
|
12-20-2022, 06:40 PM | #12 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
|
12-21-2022, 05:06 AM | #13 | ||
Wizard
Posts: 1,094
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
Happy that it was helpful for you Quote:
Some of the regex strings that have been created in our Kodi project is just mind-boggling. How one of our developers had the time, patience and knowledge to put this together is amazing. And this is only one of about 8 scrapers. Although they are no longer in use as we have moved onto Python scrapers. https://github.com/xbmc/repo-scraper...b.org/tmdb.xml |
||
12-21-2022, 05:11 AM | #14 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
TagMachanic would have been a once job to get rid of the links in one shot.
|
12-21-2022, 03:17 PM | #15 |
Not Quite Dead
Posts: 194
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
More and more, after puzzling with regex, I find myself taking CSS shortcuts that deal with the problem in a non-destructive way.
In the case of the OP, the links could be made to disappear using an attribute selector: a[title="wikilink"] {visibility: hidden;} |
Tags |
find and replace, reg expressions |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do I remove unwanted highlighting? | bizzybody | Calibre | 2 | 01-09-2020 10:45 AM |
Links as unwanted footnotes | jemandy | Editor | 1 | 07-21-2018 09:51 PM |
Removing unwanted RegEx's | dc696969 | Library Management | 1 | 03-27-2013 04:08 AM |
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil | Contre-jour | Sigil | 9 | 02-01-2013 10:47 AM |
Create MediaWiki and RTF links for opening an ePub file with Calibre viewer | johnsidi | Calibre | 1 | 12-17-2011 01:31 PM |