![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Mar 2017
Device: phone
|
Regex to remove html tags
I've been searching for a solution for hours, but haven't found any examples that help.
I want to search the file and remove all instances of <a id="pageXXX"></a> where XXX is the page number. I have tried (^<a id="page)(.*:?)("></a>) (^<a id=\\"page)(.*:?)(\\"></a>) (^<a id="page)([0-9]+)("></a>) (^<a id=\\"page)([0-9]+)(\\"></a>) What am I missing? |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Mar 2017
Device: phone
|
Found the answer.
(<a id=\"page)(.*:?)(\"></a>) |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 839
Karma: 2657572
Join Date: Jan 2017
Location: Poland
Device: Various
|
Code:
<a id="page\d+"></a> You will understand everything. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,568
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
What's the colon for?
Code:
(.*:?) I'd probably use something like: Code:
<a id="page\d+"></a> Code:
<a id="page[^>]+"></a> |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 972
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-T2, Kindle Paperwhite 11th gen
|
Disregard
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Search: <a id="page[0-9]+"></a>
Replace: |
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Step 1. Find the link with an id of "page":
Step 2. Find the numbers:
Step 3. Find the closing quote + end of the link:
Steps #1 and #3 are simpler. You can just type those in just like a normal search! But #2 is a little tricky: How do you search for numbers in Regex? Instead of doing 9 separate searches for:
you can instead say: "Hey, after 'page', look for a number!" This is where Regex's special symbols come into play: Brackets [] stand for: "Look for a single character that is in this spot." So [0123456789] says: "Hey, look for the number 0 OR the number 1 OR the number 2 ... OR the number 9". Brackets are also special—you can also put in RANGES of characters: Regex: page[0-9] That says "Find the word 'page', then a number zero THROUGH nine". But I don't just want to find single number... I want lots of numbers. How do I do that? The plus sign + stands for "ONE OR MORE of the previous thing." Regex: page[0-9]+ Now this says: "Find 'page', then find ONE OR MORE numbers zero through nine." Putting It All Together Let me color-code the 3 pieces:
so your combined regex will be: Search: <a id="page[0-9]+"></a> which will match: <a href="page1"></a> <a href="page27"></a> <a href="page123"></a> <a href="page999"></a> <a href="page123456"></a> * * * * * Extra: Regex's Special Symbol: \d Just like the plus sign is a special symbol, there are also a few others. Instead of typing "[0-9]" "[0-9]" "[0-9]" all the time, there's a shortcut for that: \d = "Matches any number" So these 2 are equivalent:
So this says: "Find ONE OR MORE of any number zero through nine":
and this says the same exact thing!:
So the searches recommended by JSWolf + BeckyEbook do the same thing: Search: <a id="page[0-9]+"></a> Search: <a id="page\d+"></a> Last edited by Tex2002ans; 04-15-2021 at 03:34 PM. |
|
![]() |
![]() |
![]() |
#8 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
@Tex2002ans excellent explanation about the regex used for this search.
![]() |
![]() |
![]() |
![]() |
#9 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
It accepts arguments |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex to find multiple spaces between HTML tags | mikapanja | Editor | 10 | 11-18-2017 07:11 AM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |
html import remove userdefined Tags | gucky | Calibre | 0 | 11-14-2010 09:35 AM |
Regex help to remove HTML footer | neonbible | Calibre | 4 | 09-09-2010 09:42 AM |
RFE: Remove remove tags in bulk edit | magphil | Calibre | 0 | 08-11-2009 10:37 AM |