Regex examples - Page 53

Pavulon · 03-19-2025, 07:10 PM

Good evening, Haudek,

Got it.
Problem solved.
Thanks for everything.

ElMiko · 03-27-2025, 08:58 PM

Is there a way to match numbers that are between repeated characters?

For example to match the numbers in:

Code:

<p class="lorem">.........6.........</p>

<p class="lorem">____8___</p>

<p class="lorem">----------3-------</p>

Something like:

Code:

<p.*?>[_-\]*?\K[0-9](?=[_-\]*?</p>)

Except that would match strings like:

Code:

<p class="lorem">_._--__8..----</p>

and would fail to match strings like:

Code:

<p class="lorem">****8***</p>

when what I'm trying to do is match numbers nested within ANY repeated string of characters.

Karellen · 03-27-2025, 09:30 PM

Quote:

Originally Posted by ElMiko

I assume you want to capture the number to reuse.

PHP Code:


			
\.+(\d+)\.+

Then change the \. to match whatever other character you want to find.

ElMiko · 03-27-2025, 09:42 PM

Quote:

Originally Posted by Karellen

I assume you want to capture the number to reuse.

PHP Code:


			
\.+(\d+)\.+

Then change the \. to match whatever other character you want to find.

One of the things I'm trying to solve for is when I don't know what the repeating character is, only that it is repeating. It could be a period or a hyphen or the letter "a"... literally anything (except a number).

Karellen · 03-27-2025, 09:49 PM

Quote:

Originally Posted by ElMiko

when what I'm trying to do is match numbers nested within ANY repeated string of characters.

Oops. Missed this last line. Thought it was part of your signature.

Try...

PHP Code:


			
>\p{P}+(\d+)\p{P}+<

ElMiko · 03-28-2025, 12:30 AM

Quote:

Originally Posted by Karellen

Oops. Missed this last line. Thought it was part of your signature.

Try...

PHP Code:


			
>\p{P}+(\d+)\p{P}+<

So, this is similar to the second problem i ran into my sample regex. Namely, it'll match ANY string (repeated or not) with a nested number. eg:

Code:

<p class="lorem">_._--__8..----</p>

I'm trying to only find numbers nested between a repeated character, not numbers nested between any characters.

Haudek · 03-28-2025, 03:34 AM

Quote:

Originally Posted by ElMiko

I'm trying to only find numbers nested between a repeated character, not numbers nested between any characters.

I think I understand what you need.

Code:

<p[^>]*>.*?(.).*?\d+.*?\1.*?</p>

IMHO, the key is to use \1 in the search to indicate that you need a part that has already appeared before the number.

ElMiko · 03-28-2025, 09:31 AM

I almost considered apologizing in advance for the convoluted question, but better late than never: sorry, everybody! I realize this is a little tricky to understand

Let's say i have the following <p> elements:

Code:

1. <p class="lorem">....1...</p>

2. <p class="lorem">----2---</p>

3. <p class="lorem">___3_____</p>

4. <p class="lorem">_._--__4..----</p>

5. <p class="lorem">aaa5aaaa</p>

I am looking for regex that will match examples 1, 2, 3, and 5, BUT NOT 4.

That is to say, I'm looking to match a <p> element where the number is nested withing any string of repeating characters, and then isolate the number for reuse in a replacement function

@Haudek - I think we're getting close, although i don't think I see any backreference in the first half of the regex...

EDIT1 — SOLVED:

Thanks, Haudek. Your regex got the ball rolling. I'd forgotten that backreferencing works within the search field (not merely in the replace field).

Code:

<p[^>]*>(.)\1*?([0-9]+)\1*?</p>

ElMiko · 07-26-2025, 08:34 AM

I'm trying to insert a character into a string. Specifically, I'm trying to insert omitted apostrophes into contractions. I fear this is impossible with regex since I'm not actually matching anything—I'm matching the non-space between two characters.

The search I've been using is:

Code:

(?<=\b[Cc]an|\b[DdWw]on|\b[CcWw]ouldn|\b[Ss]houldn|[Dd]idn|\b[Ii]sn|\b[Aa]ren|\b[WwHh]asn)t\b

which matches the ending "t" that can then be replaced by "’t". But I'm hoping there's a way to write the search such that I'm functionally inserting the apostrophe rather than replacing the "t".

KevinH · 07-26-2025, 04:47 PM

Why is a replacement for t being 't an issue? It is just a text substring replacement. Replacements can be anything.

Also why not let a spellchecker catch those cases?

ElMiko · 07-26-2025, 05:28 PM

Because, Kevin H, "wont" and "cant" are actually words, so relying on spellcheck will fail to catch instances where these are legitimately erroneous. It also serves as a mental flag that always me that if there are these kinds of apostrophe errors, there's may be others (and therefore I should run some of my other "missing apostrophe" searches). And as to why I want it to be inserted rather than an insertion AND replacement, because there are other types of missing apostrophe errors that don't end in "t", and I'd like to combine them into a single search with a universal replacement value... Namely, a single apostrophe.

But here's the thing, it might simply be more helpful to just consider my question conceptually, rather than practically. Regardless of whether you understand or agree with my reasoning for wanting to create this kind of search, the question is fundamentally about what is possible within regex. Specifically matching (and replacing) liminal spaces between characters.

I actually have an idea that I've used in other contexts, but I'm away from the puter. It involves using (|\s)

KevinH · 07-26-2025, 05:50 PM

The use filter replacements replace all and only check the boxes where the replacement works. Then repeat for a different replacement.

Or

Then use a regex to capture all the cases you want and use python function replace to determine when and where and what to insert it. Then use filter replacements on it.

Doitsu · 07-26-2025, 06:25 PM

KevinH · 07-26-2025, 06:42 PM

Or use Doitsus find and the filter the replacements to decide from context which ones to apply and which to skip.

There often is no perfect search and replace but being shown the possible replacement in a table with user controlled context is a good way to make sure no mistakes are made.

I have almost given up on replacing one at a time, or using normal replace all, and instead use filter replacements almost exclusively now.

ElMiko · 07-26-2025, 07:00 PM

@Doitsu - to be clear, this isn't a Replace All Search. Unlike KevinH, I still prefer cycling through search results individually. I guess I've just learned how to recognize typos in the context of a larger selection of text more efficiently than in more isolated filter version
Also because selecting and deselecting matches feels less efficient than cycling through matches O.G. style.

03-27-2025, 08:58 PM	#782
ElMiko Fanatic Posts: 561 Karma: 65460 Join Date: Jun 2011 Device: Kindle Voyage, Boox Go 7	Is there a way to match numbers that are between repeated characters? For example to match the numbers in: Code: <p class="lorem">.........6.........</p> <p class="lorem">____8___</p> <p class="lorem">----------3-------</p> Something like: Code: <p.?>[_-\]?\K[0-9](?=[_-\]?</p>) Except that would match strings like: Code: <p class="lorem">_._--__8..----</p> and would fail to match strings like: Code: <p class="lorem">*8*</p> when what I'm trying to do is match numbers nested within ANY repeated string of characters.

03-28-2025, 09:31 AM	#788
ElMiko Fanatic Posts: 561 Karma: 65460 Join Date: Jun 2011 Device: Kindle Voyage, Boox Go 7	I almost considered apologizing in advance for the convoluted question, but better late than never: sorry, everybody! I realize this is a little tricky to understand Let's say i have the following <p> elements: Code: 1. <p class="lorem">....1...</p> 2. <p class="lorem">----2---</p> 3. <p class="lorem">___3_____</p> 4. <p class="lorem">_._--__4..----</p> 5. <p class="lorem">aaa5aaaa</p> I am looking for regex that will match examples 1, 2, 3, and 5, BUT NOT 4. That is to say, I'm looking to match a <p> element where the number is nested withing any string of repeating characters, and then isolate the number for reuse in a replacement function @Haudek - I think we're getting close, although i don't think I see any backreference in the first half of the regex... EDIT1 — SOLVED: Thanks, Haudek. Your regex got the ball rolling. I'd forgotten that backreferencing works within the search field (not merely in the replace field). Code: <p[^>]>(.)\1?([0-9]+)\1?</p> Last edited by ElMiko; 03-28-2025 at 09:45 AM.*

07-26-2025, 08:34 AM	#789
ElMiko Fanatic Posts: 561 Karma: 65460 Join Date: Jun 2011 Device: Kindle Voyage, Boox Go 7	I'm trying to insert a character into a string. Specifically, I'm trying to insert omitted apostrophes into contractions. I fear this is impossible with regex since I'm not actually matching anything—I'm matching the non-space between two characters. The search I've been using is: Code: (?<=\b[Cc]an\|\b[DdWw]on\|\b[CcWw]ouldn\|\b[Ss]houldn\|[Dd]idn\|\b[Ii]sn\|\b[Aa]ren\|\b[WwHh]asn)t\b which matches the ending "t" that can then be replaced by "’t". But I'm hoping there's a way to write the search such that I'm functionally inserting the apostrophe rather than replacing the "t".

07-26-2025, 04:47 PM	#790
KevinH Sigil Developer Posts: 9,670 Karma: 6774048 Join Date: Nov 2009 Device: many	Why is a replacement for t being 't an issue? It is just a text substring replacement. Replacements can be anything. Also why not let a spellchecker catch those cases? Last edited by KevinH; 07-26-2025 at 05:02 PM.

07-26-2025, 05:28 PM	#791
ElMiko Fanatic Posts: 561 Karma: 65460 Join Date: Jun 2011 Device: Kindle Voyage, Boox Go 7	Because, Kevin H, "wont" and "cant" are actually words, so relying on spellcheck will fail to catch instances where these are legitimately erroneous. It also serves as a mental flag that always me that if there are these kinds of apostrophe errors, there's may be others (and therefore I should run some of my other "missing apostrophe" searches). And as to why I want it to be inserted rather than an insertion AND replacement, because there are other types of missing apostrophe errors that don't end in "t", and I'd like to combine them into a single search with a universal replacement value... Namely, a single apostrophe. But here's the thing, it might simply be more helpful to just consider my question conceptually, rather than practically. Regardless of whether you understand or agree with my reasoning for wanting to create this kind of search, the question is fundamentally about what is possible within regex. Specifically matching (and replacing) liminal spaces between characters. I actually have an idea that I've used in other contexts, but I'm away from the puter. It involves using (\|\s) Last edited by ElMiko; 07-26-2025 at 06:08 PM.

03-19-2025, 07:10 PM	#781
Pavulon Member Posts: 14 Karma: 10 Join Date: Aug 2023 Device: Kobo Forma	Good evening, Haudek, Got it. Problem solved. Thanks for everything.

07-26-2025, 05:50 PM	#792
KevinH Sigil Developer Posts: 9,670 Karma: 6774048 Join Date: Nov 2009 Device: many	The use filter replacements replace all and only check the boxes where the replacement works. Then repeat for a different replacement. Or Then use a regex to capture all the cases you want and use python function replace to determine when and where and what to insert it. Then use filter replacements on it.

07-26-2025, 06:25 PM	#793
Doitsu Grand Sorcerer Posts: 5,821 Karma: 24222221 Join Date: Dec 2010 Device: Kindle PW2	A quick and dirty solution would be: Find:\b(I\|[yY]ou\|[hH]e\|[sS]he\|[iI]t\|[wW]e\|[tT]hey\|[tT]hat\|[tT]here\|[hH]ere\|[wW]hat\|[wW]ho\|[wW]here\|[sS]hould\|[cC]ould\|[wW]ould\|[mM]ust\|[mM]ight\|[cC]an\|[dD]o\|[dD]id\|[dD]oes\|[hH]ad\|[hH]as\|[hH]ave\|[iI]s\|[nN]eed\|[oO]ught\|[wW]as\|[wW]ere)(ll\|re\|ve\|nt\|m\|d\|s)\b Replace:\1'\2 It's not a perfect solution though, because it'll replace hell with he'll but also here with he're. Last edited by Doitsu; 07-26-2025 at 06:32 PM.

07-26-2025, 06:42 PM	#794
KevinH Sigil Developer Posts: 9,670 Karma: 6774048 Join Date: Nov 2009 Device: many	Or use Doitsus find and the filter the replacements to decide from context which ones to apply and which to skip. There often is no perfect search and replace but being shown the possible replacement in a table with user controlled context is a good way to make sure no mistakes are made. I have almost given up on replacing one at a time, or using normal replace all, and instead use filter replacements almost exclusively now.

07-26-2025, 07:00 PM	#795
ElMiko Fanatic Posts: 561 Karma: 65460 Join Date: Jun 2011 Device: Kindle Voyage, Boox Go 7	@Doitsu - to be clear, this isn't a Replace All Search. Unlike KevinH, I still prefer cycling through search results individually. I guess I've just learned how to recognize typos in the context of a larger selection of text more efficiently than in more isolated filter version Also because selecting and deselecting matches feels less efficient than cycling through matches O.G. style.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 06:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 03:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 08:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 03:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 04:23 AM