View Single Post
Old 04-15-2021, 03:25 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by vijer View Post
I want to search the file and remove all instances of <a id="pageXXX"></a> where XXX is the page number.

I have tried

[...]

What am I missing?
Let's break it down into 3 separate pieces:

Step 1. Find the link with an id of "page":
  • <a id="page

Step 2. Find the numbers:
  • ????????

Step 3. Find the closing quote + end of the link:
  • "></a>

Steps #1 and #3 are simpler. You can just type those in just like a normal search!

But #2 is a little tricky:

How do you search for numbers in Regex?

Instead of doing 9 separate searches for:
  • page1
  • page2
  • page3
  • [...]
  • page9

you can instead say: "Hey, after 'page', look for a number!"

This is where Regex's special symbols come into play:

Brackets [] stand for: "Look for a single character that is in this spot."

So [0123456789] says: "Hey, look for the number 0 OR the number 1 OR the number 2 ... OR the number 9".

Brackets are also special—you can also put in RANGES of characters:

Regex: page[0-9]

That says "Find the word 'page', then a number zero THROUGH nine".

But I don't just want to find single number... I want lots of numbers. How do I do that?

The plus sign + stands for "ONE OR MORE of the previous thing."

Regex: page[0-9]+

Now this says: "Find 'page', then find ONE OR MORE numbers zero through nine."

Putting It All Together

Let me color-code the 3 pieces:
  • Step 1: <a id="page
  • Step 2: [0-9]+
  • Step 3: "></a>

so your combined regex will be:

Search: <a id="page[0-9]+"></a>

which will match:

<a href="page1"></a>
<a href="page27"></a>
<a href="page123"></a>
<a href="page999"></a>
<a href="page123456"></a>


* * * * *

Extra: Regex's Special Symbol: \d

Just like the plus sign is a special symbol, there are also a few others.

Instead of typing "[0-9]" "[0-9]" "[0-9]" all the time, there's a shortcut for that:

\d = "Matches any number"

So these 2 are equivalent:
  • [0-9]
  • \d

So this says: "Find ONE OR MORE of any number zero through nine":
  • [0-9]+

and this says the same exact thing!:
  • \d+

So the searches recommended by JSWolf + BeckyEbook do the same thing:

Search: <a id="page[0-9]+"></a>
Search: <a id="page\d+"></a>

Last edited by Tex2002ans; 04-15-2021 at 03:34 PM.
Tex2002ans is offline   Reply With Quote