Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-08-2019, 09:17 AM   #1
hantsaniala
Member
hantsaniala began at the beginning.
 
hantsaniala's Avatar
 
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
A bug with the regex ?

Hello everyone,

I'm trying to use regex for dynamic id remplacement but I got some error that I don't know how to fix, that's why I'm asking for help.

I've seen somewhere in this website that Sigil use PCRE.

Regex pattern :
Code:
<a id=\"(.*)?\">([\W\s]*)?([\w]+)( ([ \?*])+)?((, )?(.*)?([A-Z])(.*))?<\/a>
Replace pattern :
Code:
<a id="\3\9">\2\3\4\6</a>
Test string :
Code:
<a id="">Anzil ou Auzil, Jacques d’</a>
The result I want :
Code:
<a id="AnzilJ">Anzil ou Auzil, Jacques d’</a>
But the result I got :
Code:
<a id="AJ">Anzil ou Auzil, Jacques d’</a>
Note that this regex work with https://regex101.com and https://regexr.com.
Sigil version : 0.9.10.

hantsaniala is offline   Reply With Quote
Old 03-08-2019, 09:57 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Uncheck "Minimal Match" in the search options. That option changes the default greediness of many operators.

It won't affect this particular search, but you also might want to make \W \w (and the like) unicode aware with (*UCP) if you have non-ascii characters in strings that you're trying to match word boundaries.

Last edited by DiapDealer; 03-08-2019 at 10:06 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 03-08-2019, 10:05 AM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, there is no one official standard for regular expressions, especially for more advanced features. PCRE differs from even Perl now, Java is different, etc. Yes we use the PCRE version 1 library inside Sigil.

Have you tried removing the square brackets around [\w]+ part of your regular expression. You are not selecting characters from a set here \w is already a set. Also what controls the greediness of the remaining letters of Anzil? And what exactly are you trying to do with capture groups 4 and 5 and with the * there couldn't it easily match the nzil part?
KevinH is offline   Reply With Quote
Old 03-08-2019, 04:50 PM   #4
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,548
Karma: 6613969
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
This worked for me:

Find:
Code:
<a(.*?)>([^\s]+)
Replace:
Code:
<a id="\2">\2
RbnJrg is offline   Reply With Quote
Old 03-08-2019, 05:25 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The OP's original worked just fine for me. They just have to uncheck the Minimal Match option and they'll get the exact results they expect.
DiapDealer is offline   Reply With Quote
Advert
Old 03-08-2019, 05:41 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
Which is why group 3 grabbed just the first letter and the next two groups grabbed the rest of Anzil
KevinH is offline   Reply With Quote
Old 03-08-2019, 06:04 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KevinH View Post
Which is why group 3 grabbed just the first letter and the next two groups grabbed the rest of Anzil
Yep.

I try not to use the Minimal Match option at all. It might work fine for a very basic expression, but having it checked for a complex one will always get you in trouble (it does me, anyway). I've found it's best to work out an expression that does what I want with the default greediness of operators than to try to alter their default greediness.
DiapDealer is offline   Reply With Quote
Old 03-11-2019, 02:00 AM   #8
hantsaniala
Member
hantsaniala began at the beginning.
 
hantsaniala's Avatar
 
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
Hello guys, sorry for the late answer.

I've already unchecked the Minimal match but still doesn't work.

This is the existing case of the search that I need to match :

Code:
<a id="">Anzil, Jacques d’</a>
<a id="">Anzil, sm Jacques d’</a>
<a id="">Anzil</a>
<a id="">? Anzil ou Auzil, Jacques d’</a>
<a id="">◊ Andelot, Pierre d’</a>
<a id="">◊ ? Angennes, Claude d’</a>
<a id="">Andrier*? André*?, Jean-Jacques</a>
<a id="">Andreas de Francia</a>
<a id="">Ancienville*?, Jean</a>
So for the identifier, I need to find the first word as the main ID and the first letter of the first word after the coma or if the coma doesn't exist, I need to find the next uppercase letter for it.

So after, a long search, I can't find a right way to do it
hantsaniala is offline   Reply With Quote
Old 03-11-2019, 06:22 AM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I don't know what to tell you. It certainly works for the first (and only) sample string you gave us in your first post. I copy/pasted your expressions/sample to verify. Which means there's no bug. If it doesn't work for all of the strings you encounter, it means your expression is insufficient. Are you reporting a bug, or asking for help with an expression?
DiapDealer is offline   Reply With Quote
Old 03-11-2019, 06:39 AM   #10
hantsaniala
Member
hantsaniala began at the beginning.
 
hantsaniala's Avatar
 
Posts: 16
Karma: 10
Join Date: Dec 2015
Device: sigil
I'll not report a bug and I'll ask for a help with the regex if someone can help me. But many thanks for the answer you gave all before.

And I wanna know the version of the PCRE used in 0.8.10 if possible. I want to find a tool like https://regex101.com to test my pattern. Or is there already a script for that ?

hantsaniala is offline   Reply With Quote
Old 03-11-2019, 08:29 AM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by khadafi View Post
I want to find a tool like https://regex101.com to test my pattern. Or is there already a script for that ?

If you leave all the optional checkboxes unchecked, most expressions you use in Sigil should behave similarly in any PCRE engine/test.

The PCRE version in Sigil is 8.37
DiapDealer is offline   Reply With Quote
Old 03-11-2019, 11:13 AM   #12
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Actually, the only failure I experience with your F&R expressions is with the second sample match above:
Code:
<a id="">Anzil, sm Jacques d’</a>
According to your desired result:
Quote:
So for the identifier, I need to find the first word as the main ID and the first letter of the first word after the coma or if the coma doesn't exist, I need to find the next uppercase letter for it.
the above should produce:
Code:
<a id="Anzils">Anzil, sm Jacques d’</a>
but it in fact produces:
Code:
<a id="AnzilJ">Anzil, sm Jacques d’</a>
Everything else seems to produce your expected results.

Keep in mind that you probably shouldn't rely on this sort of thing to generate ids in the first place. If they're in the same document, anchor ids need to be unique. And there's no way any regexp F&R is going to be able to guarantee uniqueness when relying utterly upon the slicing/concatenation of strings present in the match.

Last edited by DiapDealer; 03-11-2019 at 11:16 AM.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[BUG] - M96 out of memory - [BUG] Alf77 Onyx Boox 5 02-05-2015 11:47 AM
Help me with regex please. eVrajka Library Management 5 08-15-2011 12:17 PM
DR800 Help, I've got a bug!! A bug on my screen!! Franky iRex 4 06-21-2011 11:45 AM
Embedded font bug or CSS bug in ADE JSWolf ePub 10 06-11-2011 02:34 PM
PRS-505 bug or eBookLib bug? porkupan Sony Reader 3 10-07-2007 10:44 PM


All times are GMT -4. The time now is 06:47 AM.


MobileRead.com is a privately owned, operated and funded community.