![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Indefinite length lookbehind
I'm trying to find instances of the following string
Code:
said. “ I want to exclude matches where the string is preceded by a word, preceded by a closing curly quotation mark. e.g. Code:
” Jack said. “ Code:
(?<!”\s\w+?\s)said\. “ I was under the impression that 2.4.2's regex natively allows for indefinite length lookbehinds. What am I doing wrong? Is there some a different syntax that needs to be used for indefinite length lookbehinds? Last edited by ElMiko; Today at 05:15 AM. |
![]() |
![]() |
![]() |
#2 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,313
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Try giving it a batch of characters to choose from enclosed with square brackets?
Code:
(?<!”[\s\w\.]+)said\. “ Edit: You might even try tokenizing the \. in the negative look behind pattern to catch any punctuation \p{P} instead of just periods. Code:
(?<!”[\s\w\p{P}]+)said\. “ ![]() Last edited by Turtle91; Today at 07:17 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
@Turtle91 - I never cried as a child, and I started reciting sonnets in the natal ward.
Unfortunately, this syntax doesn't work either. As with my attempt, the element that breaks it is the quantifier "+"—basically, the bit of the search that is supposed to be making it indefinite in length! The problem I'm trying to solve is that the OCR misread many commas as periods, resulting in text like: Code:
He turned as Charles said. “Howdy!" Code:
“Let's go,” Charles said. “I think I'm done here.” |
![]() |
![]() |
![]() |
#4 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Hmmm, I found this...
But with all respect to the author, I can't make heads or tails of the explanation... much less how to apply it to anything other than matching the letter "X"... Just as importantly, I can't even get it to match the letter "X" in any given Sigil file... Last edited by ElMiko; Today at 08:07 AM. |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,485
Karma: 5703586
Join Date: Nov 2009
Device: many
|
See the pcre2 maintain had to say when he implemented this in 2023 here:
https://github.com/PCRE2Project/pcre2/issues/269 It seems the PCRE2 approach requires a backwards max range and not a + |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Quote:
Code:
(?<!”\s\w{1,10}\s)said\. “ |
|
![]() |
![]() |
![]() |
#7 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,485
Karma: 5703586
Join Date: Nov 2009
Device: many
|
What does the exact error message say? Mouse over the find field or valid regex symbol?
Does it show the exact error message? Try a character range not a word range. Did that change the error? |
![]() |
![]() |
![]() |
#8 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,485
Karma: 5703586
Join Date: Nov 2009
Device: many
|
I checked the pcre2 source for changes and saw this:
Quote:
Are you using an assertion properly? A more specific error message might help if you can get one. |
|
![]() |
![]() |
![]() |
#9 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,683
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
@ElMiko A long time ago I created a throw-away Sigil regex tester validation plugin that should theoretically work for your regex.
After the installation you'll find the plugin under Plugins > Validation > RegexTester. (You'll need to select the "regex" engine.) In my test case: Code:
<p>Lorem “ipsum dolor” Jack said. “</p> <p>Lorem ipsum dolor said. “</p> <p>Dolor amet said. “</p> Code:
(?<!”\s\w+\s)said\. “ |
![]() |
![]() |
![]() |
#10 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Quote:
I've tried several permutations of the regex: Code:
\w \S \u \l \D [a-z] . @Doitsu - Yeah, I don't know what's going on. |
|
![]() |
![]() |
![]() |
#11 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,911
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
\S is not \s
\S is not a space char |
![]() |
![]() |
![]() |
#12 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 391
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
I know that, theducks. That's why I used it. Non-space, followed by min/max range, followed by space. But even if it were a mistake, the point is that the STRUCTURE is being interpreted as invalid.
But also, even if it had been a mistake it wouldn't explain why the other variants aren't working either. Last edited by ElMiko; Today at 07:26 PM. |
![]() |
![]() |
![]() |
#13 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,485
Karma: 5703586
Join Date: Nov 2009
Device: many
|
It is our long holiday weekend in Canada, but I finally got some time to test things in Sigil on my only laptop up here at my cottage. It is a pre-release version of the forthcoming Sigil v2.50.
I decided to test the example cited by one of the issues posted at PCRE2 in that link I posted earlier. In my xhtml file I have: Code:
<p> 0xxxy </p> Code:
(?<=0x{1,6})y Would you please try this test with your Sigil 2.4.2 and let me know if you get the same thing? Perhaps there was a bug in PCRE2 10.44 that got fixed in PCRE2 10.45 which is in the upcoming release of Sigil. I will try Doitsu's test next. Last edited by KevinH; Today at 08:29 PM. |
![]() |
![]() |
![]() |
#14 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,485
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Okay I tested the following in Sigil 2.50 (pre-release):
The xhtml file was: Code:
<p>Lorem “ipsum dolor” Jack said. “</p> <p>Lorem ipsum dolor said. “</p> <p>Dolor amet said. “</p> <p> 0xxxy </p> Code:
(?<!”\s\w{1,6}\s)said\. “ So as far as I can tell with these examples, all is working. But again this version of Sigil has a newer version of PCRE2 (10.45) than the version that came in Sigil 2.4.2 (10.44), so since you are seeing something different I would guess that there was a PCRE2 bug in 10.44 that got fixed. If it is any help, we are hoping to do final updates of the translations this week and will try to make a full release by next weekend if both of us can work it into our schedules. If you desperately need something immediately, I can generate a CI build of current Sigil master (it will be missing translations in most languages) and make a link available to you. But please test your Sigil 2.4.2 build and let us know if it fails these very specific tests (ie. if there was a PCRE2 bug). Last edited by KevinH; Today at 08:37 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Battery length | ORLOV | General Discussions | 22 | 07-28-2011 04:14 PM |
Which length of fiction? | crich70 | Writers' Corner | 12 | 06-03-2011 06:27 PM |
File length in MB only | clockmaker | Calibre | 1 | 07-20-2010 10:35 AM |
.7.5 - Zero Length Zips | edbro | Calibre | 2 | 06-27-2010 05:22 PM |
length of ebooks? | poshm | Writers' Corner | 20 | 11-17-2009 10:30 AM |