View Single Post
Old 01-22-2025, 02:42 PM   #7
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 473
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by DiapDealer View Post
Oh, my bad. I was thinking the bundled python available for plugins with the newer Sigil. You are definitely limited to standard pcre regex when using Sigil's search and replace. So lookaheads can be variable width, but not lookbehinds. Sorry.

You still should be able to get Kevin's search working though. Why do you need to capture JUST the "were'? What's the end goal? Are you looking to replace "were" with something else, including nothing (deleting)?
Yeah, I knew that there were things the newer toy was going to do a heck of a lot better than the old, but I just couldn't get the WYSIWYG functionality working the way I wanted in the newer "plug-in" version, and—to respond to JSWolf—since I do A LOT of formatting in the WYSIWYG editor, I'm stuck with Old Faithful.

In response to your question, basically I'm trying to isolate instances of "were" that ought have been "we're". In other words, instances in which the apostrophe denoting a contraction has not been capture by the original OCR.

This mostly occurs in dialogue (rather than narrative), so I'm trying to quickly review the instances of the word and replace it if appropriate.

What I've got now (following your revelation about the lookahead) is:
Code:
(?<!\b[Tt]hey |\b[Ww]e |\b[Tt]he[rs]e |\b[Oo]thers |\b[Pp]eople |\b[Ss]ome |\b[Ss]he |\b[Yy]ou |\bit )\b([Ww])ere\b(?=[^“]*?”)
with the replace value as:
Code:
\1e’re
I suspect there will be some tail cases where this doesn't work, but it's already a darn sight better than it was: 23 matches now (clunky, but do-able) vs 187 matches then (pure nightmare).

Thanks, guys!

Last edited by ElMiko; 01-22-2025 at 02:47 PM.
ElMiko is offline   Reply With Quote