03-08-2019, 09:17 AM | #1 |
Member
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
|
A bug with the regex ?
Hello everyone,
I'm trying to use regex for dynamic id remplacement but I got some error that I don't know how to fix, that's why I'm asking for help. I've seen somewhere in this website that Sigil use PCRE. Regex pattern : Code:
<a id=\"(.*)?\">([\W\s]*)?([\w]+)( ([ \?*])+)?((, )?(.*)?([A-Z])(.*))?<\/a> Code:
<a id="\3\9">\2\3\4\6</a> Code:
<a id="">Anzil ou Auzil, Jacques d’</a> Code:
<a id="AnzilJ">Anzil ou Auzil, Jacques d’</a> Code:
<a id="AJ">Anzil ou Auzil, Jacques d’</a> Sigil version : 0.9.10. |
03-08-2019, 09:57 AM | #2 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Uncheck "Minimal Match" in the search options. That option changes the default greediness of many operators.
It won't affect this particular search, but you also might want to make \W \w (and the like) unicode aware with (*UCP) if you have non-ascii characters in strings that you're trying to match word boundaries. Last edited by DiapDealer; 03-08-2019 at 10:06 AM. |
Advert | |
|
03-08-2019, 10:05 AM | #3 |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
FWIW, there is no one official standard for regular expressions, especially for more advanced features. PCRE differs from even Perl now, Java is different, etc. Yes we use the PCRE version 1 library inside Sigil.
Have you tried removing the square brackets around [\w]+ part of your regular expression. You are not selecting characters from a set here \w is already a set. Also what controls the greediness of the remaining letters of Anzil? And what exactly are you trying to do with capture groups 4 and 5 and with the * there couldn't it easily match the nzil part? |
03-08-2019, 04:50 PM | #4 |
Wizard
Posts: 1,548
Karma: 6613969
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
|
This worked for me:
Find: Code:
<a(.*?)>([^\s]+) Code:
<a id="\2">\2 |
03-08-2019, 05:25 PM | #5 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The OP's original worked just fine for me. They just have to uncheck the Minimal Match option and they'll get the exact results they expect.
|
Advert | |
|
03-08-2019, 05:41 PM | #6 |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Which is why group 3 grabbed just the first letter and the next two groups grabbed the rest of Anzil
|
03-08-2019, 06:04 PM | #7 | |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I try not to use the Minimal Match option at all. It might work fine for a very basic expression, but having it checked for a complex one will always get you in trouble (it does me, anyway). I've found it's best to work out an expression that does what I want with the default greediness of operators than to try to alter their default greediness. |
|
03-11-2019, 02:00 AM | #8 |
Member
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
|
Hello guys, sorry for the late answer.
I've already unchecked the Minimal match but still doesn't work. This is the existing case of the search that I need to match : Code:
<a id="">Anzil, Jacques d’</a> <a id="">Anzil, sm Jacques d’</a> <a id="">Anzil</a> <a id="">? Anzil ou Auzil, Jacques d’</a> <a id="">◊ Andelot, Pierre d’</a> <a id="">◊ ? Angennes, Claude d’</a> <a id="">Andrier*? André*?, Jean-Jacques</a> <a id="">Andreas de Francia</a> <a id="">Ancienville*?, Jean</a> So after, a long search, I can't find a right way to do it |
03-11-2019, 06:22 AM | #9 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I don't know what to tell you. It certainly works for the first (and only) sample string you gave us in your first post. I copy/pasted your expressions/sample to verify. Which means there's no bug. If it doesn't work for all of the strings you encounter, it means your expression is insufficient. Are you reporting a bug, or asking for help with an expression?
|
03-11-2019, 06:39 AM | #10 |
Member
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
|
I'll not report a bug and I'll ask for a help with the regex if someone can help me. But many thanks for the answer you gave all before.
And I wanna know the version of the PCRE used in 0.8.10 if possible. I want to find a tool like https://regex101.com to test my pattern. Or is there already a script for that ? |
03-11-2019, 08:29 AM | #11 | |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
The PCRE version in Sigil is 8.37 |
|
03-11-2019, 11:13 AM | #12 | |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Actually, the only failure I experience with your F&R expressions is with the second sample match above:
Code:
<a id="">Anzil, sm Jacques d’</a> Quote:
Code:
<a id="Anzils">Anzil, sm Jacques d’</a> Code:
<a id="AnzilJ">Anzil, sm Jacques d’</a> Keep in mind that you probably shouldn't rely on this sort of thing to generate ids in the first place. If they're in the same document, anchor ids need to be unique. And there's no way any regexp F&R is going to be able to guarantee uniqueness when relying utterly upon the slicing/concatenation of strings present in the match. Last edited by DiapDealer; 03-11-2019 at 11:16 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[BUG] - M96 out of memory - [BUG] | Alf77 | Onyx Boox | 5 | 02-05-2015 11:47 AM |
Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM |
DR800 Help, I've got a bug!! A bug on my screen!! | Franky | iRex | 4 | 06-21-2011 11:45 AM |
Embedded font bug or CSS bug in ADE | JSWolf | ePub | 10 | 06-11-2011 02:34 PM |
PRS-505 bug or eBookLib bug? | porkupan | Sony Reader | 3 | 10-07-2007 10:44 PM |