02-06-2018, 04:02 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2013
Location: Europe
Device: Mediapad M3
|
Regex Search Repetition doesn't work?!
In the past I was using repetition searches like
Code:
[a-zA-Z]* [0-9]{1,} I've tested with several patterns like {0,}, which is the same as * --> now finds nothing So it behaves completely strange and only finds the number of characters before the Comma, but no range... {1,} being the same as + --> now finds exactly 1 character {2,4} --> now finds exactly 2 and not the expected 2..4 characters In the first step I suspected Sigil and installed the previous version 0.9.8 in parallel (yes, that works, even it requires some renaming of folders), but it showed exactly the same behavior. As far as I've seen regex is handled by the PCRE engine, but that doesn't seem to be dependent from Java versions etc, right? Any idea where to look at? Thanks for any hint! Last edited by Wasserpulle; 02-09-2018 at 04:25 AM. Reason: problem solved |
02-06-2018, 08:07 AM | #2 |
Sigil Developer
Posts: 7,642
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Perhaps some other app on Windows 10 has a different or broken pcre library someplace in your path?
|
Advert | |
|
02-06-2018, 09:22 AM | #3 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
With PCRE being statically built and linked into the Windows Sigil binary, is that even possible?
I'm assuming, of course that the OP is using the official released version of Sigil and hasn't custom compiled Sigil using system libraries. |
02-06-2018, 10:21 AM | #4 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
[0-9]{1,} works on my Windows and Linux machines. Edit: The regex works, but only if the cursor is positioned before a letter. Last edited by Doitsu; 02-17-2018 at 02:31 PM. |
|
02-06-2018, 10:33 AM | #5 | |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2013
Location: Europe
Device: Mediapad M3
|
Quote:
I don't have anything using a PCRE lib in my path. Well, there is one other program, but after installing it hasn't even started yet. But there seems to be another problem if Doitsu is having a similar issue, even it is just a subset of mine? |
|
Advert | |
|
02-06-2018, 01:08 PM | #6 |
Sigil Developer
Posts: 7,642
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Tried the following on a Mac and it worked exactly as expected:
Code:
The regular expression used: an:\s[a-zA-Z]* Code:
The file to search in: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>huh</title> </head> <body> <p> There is an: B C 1 2 3</p> <p> There is an: 1 2 3</p> </body> </html> So regular expressions seem to work just fine on a Mac. Would someone with access to Windows please try this exact example and let me know if it works correctly or not. Last edited by KevinH; 02-06-2018 at 01:27 PM. |
02-06-2018, 01:13 PM | #7 |
Sigil Developer
Posts: 7,642
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Shouldn't be possible if statically linked.
But then I am out of ideas. A locale issue? |
02-06-2018, 01:15 PM | #8 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'll do some testing when I get home.
I haven't looked at the config for the bundled PCRE in a long, long time, so I can't remember if there's differences in how it's configured on Mac vs Win/Lin. I do know that nothing has changed on the bundled PCRE front for a very long time, though. EDIT: I guess I made a very minor change two years ago to clean up some Windows compiler warnings, but I really can't imagine that being the issue. I'll certainly check to make sure, though. Last edited by DiapDealer; 02-06-2018 at 01:28 PM. |
02-06-2018, 01:26 PM | #9 |
Sigil Developer
Posts: 7,642
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Tried the following on a Mac and it worked just fine (ie. it was greedy as expected)
Code:
regular expression was 3{2,4} Code:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> <p> </p> p> There is an: B C 1 2 3</p> <p> There is an: 1 2 333</p> </body> </html> Would someone please try this on Windows and let me know if it works correctly or not. Last edited by KevinH; 02-06-2018 at 03:50 PM. |
02-06-2018, 02:08 PM | #10 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2013
Location: Europe
Device: Mediapad M3
|
Tried that search with your XML code (btw you missed the < in the beginning) under Windows and did the search again, of course enabled regex.
Result: found exactly the first "33" of the string "333". Looking forward to the results of DiapDealer. |
02-06-2018, 02:23 PM | #11 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
* + and {min, max} repetition should all be greedy by default. Are certain you're adding or changing nothing in an attempt to affect the default greediness of Kevin's tests?
|
02-06-2018, 02:27 PM | #12 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
02-06-2018, 03:19 PM | #13 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I get the same (and expected) results as KevinH and Doitsu on Windows 10.
@Wasserpulle: I'm guessing if Kevin's second test is only matching "33" for you, that you have the Minimal Match option checked. That overrides PCRE's default greediness behavior (including the greediness of repetition). If you uncheck it, you should get the same results as us. That could also easily explain the results you describe in the very first post. With the cursor in front of a string of alpha characters, [a-zA-Z]* isn't going to match anything when Minimum Match is checked. Because the minimum match of an expression that's allowed to return nothing will always be nothing. {1,} should only match the first occurrence if Minimal Match is checked. Same with {2,4}. With Minimal Match checked, it will never match anything other than the first two occurrences of the criteria. 3{2,4} with Minimum Match checked is essentially the same as searching for "33" -- which is why that's what it matches for you. Last edited by DiapDealer; 02-06-2018 at 06:30 PM. |
02-06-2018, 03:52 PM | #14 | |
Sigil Developer
Posts: 7,642
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
Last edited by KevinH; 02-06-2018 at 03:55 PM. |
|
02-06-2018, 05:55 PM | #15 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
It is probably a locale issue. Try a semicolon instead, so: {1;2}
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search is a regex?? | drawson1 | Library Management | 4 | 12-21-2017 09:57 PM |
Regex in search problems (NOT Search&Replace; the search bar) | lairdb | Calibre | 3 | 03-15-2017 07:10 PM |
Regex Search doesn't search all files in Edit Book | GregTheGrate | Editor | 8 | 11-08-2016 12:47 AM |
Why didn't this regex work right? | mrmikel | Editor | 1 | 04-12-2014 10:04 AM |
Search & Replace doesn't work for quotes | habanr | Conversion | 11 | 04-22-2011 11:50 AM |