Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-19-2022, 10:27 PM   #1
aknight2015
Junior Member
aknight2015 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
Using Regex to find and remove unwanted mediawiki links

I've tried reading the documentation about Regex, and it's all gibberish to me. I have no idea what each symbol means or does and the guides I've tried to read approaches the subject as if I'm familiar with similar languages. I just need to know what I can type, and what each things means/represents to remove links like <a href="13th_Black_Crusade" title="wikilink"> with nothing, or a space if needed. I tried reading the Regex sticky on these forums and it's just {[{(^&*^%&^*^*(} to me. Any help would be appreciated.
aknight2015 is offline   Reply With Quote
Old 12-19-2022, 10:49 PM   #2
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,094
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
If the string to remove is this, and lets assume there is a closing tag as well

PHP Code:
<a href="13th_Black_Crusade" title="wikilink"></a
Your search regex would be...
PHP Code:
<a href=".*?" title=".*?"></a
.*? = anything between the two quotation marks

But if there is something between the opening and closing tag that you need to save, you would use...
PHP Code:
<a href=".*?" title=".*?">(.*?)</a
and in the replace box you would use...

PHP Code:
\
Karellen is online now   Reply With Quote
Old 12-19-2022, 11:11 PM   #3
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
I had to eat that elephant one bite at a time....and I'm nowhere close to being as good as some of the reg-fu masters around here... but I can get most of what I want done with a few of the basics like Karellen mentioned.

This website has a basic description that can get you started understanding what regex is about
https://www.developer.com/languages/...ssions-primer/

but that website's examples are used more for programmers. Once you have a basic understanding then you can use this website to learn the specific flavor of regex that Sigil uses (PCRE):
https://www.regular-expressions.info/pcre.html

And here is a basic memory aid that helps until you just memorize them from using them all the time:
http://marvin.cs.uidaho.edu/~heckend...uts/regex.html

Once you have a basic idea of what is involved, then the regex sticky is where people go to share (or ask) how to do some specific things.
Turtle91 is offline   Reply With Quote
Old 12-20-2022, 12:14 AM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by aknight2015 View Post
[...] to remove links like <a href="13th_Black_Crusade" title="wikilink"> with nothing, or a space if needed.
Search: <a href="[^"]+" title="wikilink">
Replace: <a>

- - -

In Plain English, what does this regex do?

There's 3 key parts of the Search:

1. <a href="
  • = Look for the beginning of the <a> link.

2. [^"]+
  • = Look for 1 OR MORE "any character that ISN'T a double quote".
  • In Regex-speak, the symbols:
    • [] = Anything inside these brackets? "Look for THIS LIST OF CHARACTERS in this spot!"
    • The ^ sign is special, and says "Hey, you see the characters inside the brackets? Find anything that's NOT these!"
    • The + sign says "Look for 1 OR MORE of the previous thing."

3. " title="wikilink">
  • = Look for the rest of that <a> link.

Replace with:
  • A blank <a>

- - -

Before:

Code:
<a href="13th_Black_Crusade" title="wikilink">
<a href="14th_Black_Crusade" title="wikilink">
<a href="15th_Black_Crusade" title="wikilink">
<a href="BlahBlahBlah" title="wikilink">
After:


Code:
<a>
<a>
<a>
<a>
Quote:
Originally Posted by aknight2015 View Post
I've tried reading the documentation about Regex, and it's all gibberish to me. I have no idea what each symbol means or does and the guides I've tried to read approaches the subject as if I'm familiar with similar languages. I just need to know what I can type, and what each things means/represents
See my recent post from last month:

I did step-by-step breakdowns on some Regex, plus I linked to a ton of my previous posts about the subject.

Many of my MobileRead topics even have color-coded regex, so you can see which piece does what.

Last edited by Tex2002ans; 12-20-2022 at 06:54 AM.
Tex2002ans is offline   Reply With Quote
Old 12-20-2022, 12:24 AM   #5
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,393
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Tex2002ans View Post
Search: <a href="[^"]+" title="wikilink">
Replace: <--- PUT BLANK/NOTHING HERE.

- - -

In Plain English, what does this regex do?

There's 3 key parts of the Search:

1. <a href="
  • = Look for the beginning of the <a> link.

2. [^"]+
  • = Look for 1 OR MORE "any character that ISN'T a double quote".
  • In Regex-speak, the symbols:
    • [] = Anything inside these brackets? "Look for THIS LIST OF CHARACTERS in this spot!"
    • The ^ sign is special, and says "Hey, you see the characters inside the brackets? Find anything that's NOT these!"
    • The + sign says "Look for 1 OR MORE of the previous thing."

3. " title="wikilink">
  • = Look for the rest of that <a> link.

Replace with:

NOTHING!
Sorry to disagree, but I would suggest replacing the <a href..."wikilink"> with <a> otherwise you'll be hunting down and replacing all the </a>s that have been orphaned. After the search/replace, I'd use DiapDealer's TagMechanic to remove the naked <a>whatever</a> pairs.
DNSB is offline   Reply With Quote
Old 12-20-2022, 06:41 AM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DNSB View Post
Sorry to disagree, but I would suggest replacing the <a href..."wikilink"> with <a> otherwise you'll be hunting down and replacing all the </a>s that have been orphaned.
Yes, good catch. I'll correct the post.

Quote:
Originally Posted by DNSB View Post
After the search/replace, I'd use DiapDealer's TagMechanic to remove the naked <a>whatever</a> pairs.
Yes, yes, if you want to be much safer, always use TagMechanic.

- - -

Side Note: But, in this very specific case, I always just trusted Sigil:
  • Tools > Reformat HTML > Mend and Prettify All HTML Files

to auto-remove those dangling/orphaned </a>:

Code:
disappeared link</a>
But... I know what I'm doing with regex + I run it immediately after doing that pass.

Another time I use that is when I'm fixing up TOCs! I'm just too lazy to correct all the nested mess, so I:
  • Run Tools > Table of Contents > Create HTML Table of Contents.
  • Use Saved Searches + regex to wipe most of it away.
  • Run Mend+Prettify.

and am left with a very clean TOC!

One of these days, I should update those Saved Searches, but they've been serving me so well for 10+ years!!!

Last edited by Tex2002ans; 12-20-2022 at 07:52 AM.
Tex2002ans is offline   Reply With Quote
Old 12-20-2022, 03:19 PM   #7
aknight2015
Junior Member
aknight2015 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
So, that will remove the whole link? <a href="words in here" title"=wikilink><a>?

Quote:
Originally Posted by Karellen View Post
If the string to remove is this, and lets assume there is a closing tag as well

PHP Code:
<a href="13th_Black_Crusade" title="wikilink"></a
Your search regex would be...
PHP Code:
<a href=".*?" title=".*?"></a
.*? = anything between the two quotation marks

But if there is something between the opening and closing tag that you need to save, you would use...
PHP Code:
<a href=".*?" title=".*?">(.*?)</a
and in the replace box you would use...

PHP Code:
\
aknight2015 is offline   Reply With Quote
Old 12-20-2022, 03:41 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I would use TagMechanic to remove the <a... from the start.
JSWolf is online now   Reply With Quote
Old 12-20-2022, 04:17 PM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by aknight2015 View Post
So, that will remove the whole link? <a href="words in here" title"=wikilink></a>?
You usually want to be VERY careful trying to capture/delete "everything between the <a> + </a>"—especially if you're new to regex—because you can sometimes have VERY nasty code (or edge-cases) in your books.

Think something like this.

You are trying to get rid of the first RED <a>:

Code:
<a href="Extra"></a><a href="Correct">Clickable Link</a>
If you aren't careful, regex could accidentally do something like this instead:

Code:
</a><a href="Correct">Clickable Link
You see how you:
  • Remove the 1st <a>
  • But the the 2nd link's </a> disappeared?

This is why it's sometimes easier to do things in stages, instead of all-in-one swoop.

- - - - -

This is where DiapDealer's Sigil plugin would help:

That makes sure to match every single open <a> with its matching closing </a>.

- - - - -

You would take your original code:

Code:
<a href="BlahBlahBlah" title="wikilink">A link we don't want.</a>
<a href="BlahBlahBlah2" title="wikilink">A link we don't want.</a>
<a href="3rd-Example" title="wikilink">A link we want.</a>
Step 1: Clean It:

Code:
<a>A link we don't want.</a>
<a>A link we don't want.</a>
<a href="3rd-Example" title="wikilink">A link we want.</a>
Step 2: Run TagMechanic and choose:
  • Action Type: Delete
  • Tag Name: a
  • Having the Attribute: No attributes ("naked" tag)

and it would find all the blank <a>s—with nothing in them—and delete them:

Code:
A link we don't want.
A link we don't want.
<a href="3rd-Example" title="wikilink">A link we want.</a>
- - -

Side Note: I wrote a few TagMechanic tutorials/tips over the years:

It's very helpful for cleaning up code like this.

Want to get rid of all the:
  • <span class="useless">
  • <a class="junk">
  • empty <span></span> around everything?

No problem!

Want to convert:
  • <span class="italics"> -> <i>
  • <span class="bold"> -> <b>
  • <span class="calibre123"> -> <em>

No problem!

Last edited by Tex2002ans; 12-20-2022 at 04:22 PM.
Tex2002ans is offline   Reply With Quote
Old 12-20-2022, 05:55 PM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Tex2002ans View Post
[*] <span class="calibre123"> -> <em>
Do you really want to go there yet again?
JSWolf is online now   Reply With Quote
Old 12-20-2022, 06:37 PM   #11
aknight2015
Junior Member
aknight2015 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Dec 2022
Device: Galaxy Tab SM-T500
You turned a multiday job into a 30 second job. If even 30 seconds. Thank you so much for that.

Quote:
Originally Posted by Karellen View Post
If the string to remove is this, and lets assume there is a closing tag as well

PHP Code:
<a href="13th_Black_Crusade" title="wikilink"></a
Your search regex would be...
PHP Code:
<a href=".*?" title=".*?"></a
.*? = anything between the two quotation marks

But if there is something between the opening and closing tag that you need to save, you would use...
PHP Code:
<a href=".*?" title=".*?">(.*?)</a
and in the replace box you would use...

PHP Code:
\
aknight2015 is offline   Reply With Quote
Old 12-20-2022, 06:40 PM   #12
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Tex2002ans is offline   Reply With Quote
Old 12-21-2022, 05:06 AM   #13
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,094
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by aknight2015 View Post
You turned a multiday job into a 30 second job. If even 30 seconds. Thank you so much for that.
Great!!
Happy that it was helpful for you


Quote:
Originally Posted by Turtle91 View Post
I had to eat that elephant one bite at a time....and I'm nowhere close to being as good as some of the reg-fu masters around here... but I can get most of what I want done with a few of the basics like Karellen mentioned.
Yep, it was a pretty steep learning curve. I am nowhere near guru level, but strong enough to do the things I need with ebooks and other similar projects. I've been trying to dive into Python also, but life keeps interrupting me. Oh well.

Some of the regex strings that have been created in our Kodi project is just mind-boggling. How one of our developers had the time, patience and knowledge to put this together is amazing. And this is only one of about 8 scrapers. Although they are no longer in use as we have moved onto Python scrapers.

https://github.com/xbmc/repo-scraper...b.org/tmdb.xml
Karellen is online now   Reply With Quote
Old 12-21-2022, 05:11 AM   #14
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
TagMachanic would have been a once job to get rid of the links in one shot.
JSWolf is online now   Reply With Quote
Old 12-21-2022, 03:17 PM   #15
Brett Merkey
Not Quite Dead
Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.
 
Posts: 194
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
More and more, after puzzling with regex, I find myself taking CSS shortcuts that deal with the problem in a non-destructive way.

In the case of the OP, the links could be made to disappear using an attribute selector:

a[title="wikilink"] {visibility: hidden;}
Brett Merkey is offline   Reply With Quote
Reply

Tags
find and replace, reg expressions


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I remove unwanted highlighting? bizzybody Calibre 2 01-09-2020 10:45 AM
Links as unwanted footnotes jemandy Editor 1 07-21-2018 09:51 PM
Removing unwanted RegEx's dc696969 Library Management 1 03-27-2013 04:08 AM
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil Contre-jour Sigil 9 02-01-2013 10:47 AM
Create MediaWiki and RTF links for opening an ePub file with Calibre viewer johnsidi Calibre 1 12-17-2011 01:31 PM


All times are GMT -4. The time now is 05:40 PM.


MobileRead.com is a privately owned, operated and funded community.