![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 326
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
[RegEx] How to match a string occurring somewhere between quotation marks
I am trying to find a word, but only when it occurs between quotation marks. For example, I want to find the word "were", but only when it occurs in dialogue (the punctuation has been smartened, by the way). As in:
Code:
“That’s where were going!” Code:
(?<=“.*?)\bwere\b(?=.*?”) Is there a way to match *and* isolate the "were"? Or is matching the whole dialogue string the best one can hope for (eg: “.*?\bwere\b.*?”)? Last edited by ElMiko; 01-22-2025 at 08:18 AM. |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,304
Karma: 5568878
Join Date: Nov 2009
Device: many
|
Why use look ahead and behinds given smart quotes make begin and end directional?
Have you tried something simpler like: “[^”]*\s(were)[,;!?.\s][^“”]*” So it looks for a starting quote, then any number of things that are not an ending quote followed by a space then the word in question followed by either a space or punctuation marks, then followed by any number of things that are not a beginning or ending quote, and then finally an ending quote. Give a version of that a try. We use this approach when using regex to find the next opening or closing tag by replacing the smart quotes with < and > Last edited by KevinH; 01-22-2025 at 10:45 AM. |
![]() |
![]() |
![]() |
#3 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,173
Karma: 201721072
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
But you're right that the built-in re module does not allow variable-width lookbehinds. Last edited by DiapDealer; 01-22-2025 at 09:49 AM. |
|
![]() |
![]() |
![]() |
#4 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 326
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Quote:
“That’s where were going!” rather than just the "were". This is a similar result to my clunkier Code:
“.*?\bwere\b.*?” @DiapDealer — Ahhhh.... see, that's why it pays to hedge one's statements when one doesn't know what the heck he's talking about! That's really helpful. Full disclosure, I'm still using an ancient version of Sigil (0.7.2). How can I—or can I even—use the Barnett regex module with it? Otherwise, I've modified my search to: Code:
\bwere\b(?=[^“]*?”) |
|
![]() |
![]() |
![]() |
#5 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 77,845
Karma: 142032074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Why not upgrade your Sigil to the latest version?
|
![]() |
![]() |
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,173
Karma: 201721072
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Oh, my bad. I was thinking the bundled python available for plugins with the newer Sigil. You are definitely limited to standard pcre regex when using Sigil's search and replace. So lookaheads can be variable width, but not lookbehinds. Sorry.
You still should be able to get Kevin's search working though. Why do you need to capture JUST the "were'? What's the end goal? Are you looking to replace "were" with something else, including nothing (deleting)? If the lookbehind is what's holding you back, try refactoring it with \K instead. \K tells the engine to pretend the match starts immediately after. Last edited by DiapDealer; 01-22-2025 at 03:42 PM. |
![]() |
![]() |
![]() |
#7 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 326
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Quote:
In response to your question, basically I'm trying to isolate instances of "were" that ought have been "we're". In other words, instances in which the apostrophe denoting a contraction has not been capture by the original OCR. This mostly occurs in dialogue (rather than narrative), so I'm trying to quickly review the instances of the word and replace it if appropriate. What I've got now (following your revelation about the lookahead) is: Code:
(?<!\b[Tt]hey |\b[Ww]e |\b[Tt]he[rs]e |\b[Oo]thers |\b[Pp]eople |\b[Ss]ome |\b[Ss]he |\b[Yy]ou |\bit )\b([Ww])ere\b(?=[^“]*?”) Code:
\1e’re Thanks, guys! Last edited by ElMiko; 01-22-2025 at 03:47 PM. |
|
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,173
Karma: 201721072
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Try something like:
Code:
“[^”]*\b\Kwere\b(?=.*?”) |
![]() |
![]() |
![]() |
#9 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 326
Karma: 56788
Join Date: Jun 2011
Device: Kindle
|
Oh...
My... Sexy... Well that's a new tool for the toolbox. Amazing. Thank you! |
![]() |
![]() |
![]() |
#10 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,173
Karma: 201721072
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
No problem. I almost forgot about it actually. It used to be one of my GOTOs.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Boxes instead of quotation marks | testingfaze | Conversion | 2 | 12-03-2013 04:07 AM |
Quotation marks overhanging? | Cameronpaterson | Kobo Reader | 14 | 08-12-2011 06:16 AM |
Quotation marks missing... | lestatar | Conversion | 2 | 06-11-2011 07:39 AM |
Funny looking quotation marks | Novasea | Workshop | 9 | 12-09-2010 10:30 AM |
Please help with quotation marks | Vauh | Calibre | 5 | 04-28-2010 11:15 AM |