Hello Haudek,
Thank you for taking the time to help me.
We are, through my fault, encountering comprehension problems.
I never meant
inconsistency, but
inconstancy (translation?).
I'd like to make it clear that there's no problem with the spacing and that it was only due to the fact that I'd mismanaged (the senility of my 73 years no doubt...) the copy-paste between the different editors.
Let me start from the beginning.
I was showing an example that works, where in links, I have the inconstant presence of a character string (in this case the unbreakable space). Inconstant in French, which means that sometimes there are (from one to several), sometimes there aren't, as in :
Code:
<p><a href="anchor450" id="for450">450</a>Text1</p>
<p><a href="anchor451" id="for451">451</a>*Text2</p>
<p><a href="anchor452" id="for452">452</a>**Text3</p>
<p><a href="anchor453" id="for453">453</a>***Text4</p>
I was saying that with a capturing group and the quantifier "*" - star, asterisk - (0 or more), I could process all links, even the one where the required string (one or more non-breaking spaces) is absent (empty, null).
Code:
To research:
<p><a href="(.*?)" id="(.*?)">([0-9]{1,4})</a>(*)*(.*?)</p>
To replace:
<p><a href="\1" id="\2">\3</a>\5 something else</p>
Treats all 4 links well:
Code:
Result:
<p><a href="anchor450" id="for450">450</a>Text1 something else</p>
<p><a href="anchor451" id="for451">451</a>Text2 something else</p>
<p><a href="anchor452" id="for452">452</a>Text3 something else</p>
<p><a href="anchor453" id="for453">453</a>Text4 something else</p>
You suggested another regex, which also works.
Code:
<p><a href="(.+?)" id="(.+?)">([0-9]{1,4})</a>[(*)+]*(.+?)</p>
But I have no problem with this one. The quantifier on the searched group works and I agree:
- we could use a [\d+] notation
- we could use a non-capturing group (?: ...) and shift the \x
So here's the problem.
Let's say I have a group of links, with a string that may or may not be present. Either (0 or 1).
So, of course, I could deal with this in several passages, I could always manage, that's not the problem. But intellectually, it's still bothering me (I don't know how it's going to be translated... ,-) ).
Here, it's a class, but this string could be something else (epub:type="noteref"), etc.
Code:
<p><a href="anchor450" id="for450">450</a>Text1</p>
<p><a href="anchor451" id="for451">451</a>Text2</p>
<p><a class="backlink" href="anchor452" id="for452">452</a>Text3</p>
<p><a class="backlink" href="anchor450" id="for450">450</a>Text4</p>
So I tried a capturing group from an unknown string, but placed there, with the quantifier "?" (0 or 1).
Code:
To research:
<p><a (.+?)? href="(.+?)" id="(.+?)">([0-9]{1,4})</a>(.+?)</p>
To replace:
<p> href="\2" id="\3">\4</a>\5 something else</p>
Unsuccessfully, unlike the example of non-breaking spaces above, I can only find links in which this string is present. The string (class...) has been removed, but it's clear that the text has only been modified in links with this string.
Code:
Result:
<p><a href="anchor450" id="for450">450</a>Text1</p>
<p><a href="anchor451" id="for451">451</a>Text2</p>
<p><a href="anchor452" id="for452">452</a>Text3 something else</p>
<p><a href="anchor450" id="for450">450</a>Text4 something else</p>
If I try the capturing group with quantifier {0,1}.
Code:
To research:
<p><a (.+?){0,1} href="(.+?)" id="(.+?)">([0-9]{1,4})</a>(.+?)</p>
To replace:
<p><a href="\2" id="\3">\4</a>\5 something else</p>
Result:
<p><a href="anchor450" id="for450">450</a>Text1</p>
<p><a href="anchor451" id="for451">451</a>Text2</p>
<p><a href="anchor452" id="for452">452</a>Text3 something else</p>
<p><a href="anchor450" id="for450">450</a>Text4 something else</p>
I get the same result. I only catch links where the string is present.
If I try with the quantifier "*" (0 or more), then 1 is possible.
Code:
To research:
<p><a (.+?)* href="(.+?)" id="(.+?)">([0-9]{1,4})</a>(.+?)</p>
I'm not capturing anything!
After your posts, I also tested a non-capturing group, with the same result. Only links to the chain are matched.
Code:
To research:
<p><a (?:.+?)? href="(.+?)" id="(.+?)">([0-9]{1,4})</a>(.+?)</p>
To replace:
<p><a href="\1" id="\2">\3</a>\4 something else</p>
Result:
<p><a href="anchor450" id="for450">450</a>Text1</p>
<p><a href="anchor451" id="for451">451</a>Text2</p>
<p><a href="anchor452" id="for452">452</a>Text3 something else</p>
<p><a href="anchor450" id="for450">450</a>Text4 something else</p>
It's quite an oddity! (I wonder how that one's going to get translated again...)
As I see "significant" differences between the different translators (Deepl, Google translate, QTranslate, etc.), and I can't judge whether they're "substantial", as I don't speak English at all, I'm attaching my French text, which you can run through several translators if you need to, to clear up any ambiguities. And I'm pasting here the one given by Deepl.
But it's not surprising that there are sometimes misunderstandings in exchanges.
Thanks again for your attention.
Translated with
www.DeepL.com/Translator (free version)