![]() |
#1 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 446
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Matching words without using repetition operators
Often, I find that OCR software omits final puncutation marks between the last letter of a sentence and a closing end-quote:
Code:
eg. “My job is exhausting” Tom said laboriously. Code:
eg. Please define the words “trustworthy” and “gullible”. Code:
(?<!“[\p{L}]+)(?<=\p{L})” |
![]() |
![]() |
![]() |
#2 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,548
Karma: 19500001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
You could try a two-step (or three-step) process:
1. Replace all single word quotations cases with something that prevents a match in the next case. Something like (“[^ ]+)” and replace with \1¬”. 2. Do your normal search for unpunctuated quotes. 3. Remove all ¬ Anyway, you shouldn't do a global search and replace, there may be cases of multiple quoted words without punctuation, or single word speeches: 'What do you mean with "I don't know"?' he said. 'Weren't you listening?' 'No.' |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I believe this regex might also help in catching some of these:
Code:
(“[^ ”]+ [^”,]+)(”) Code:
\1,\2 |
![]() |
![]() |
![]() |
#4 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 446
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
ahhhhh, interesting. Never even occurred to me to break it up like that. Obviously, I'm still holding out hope for something that can be done in a single search, but failing that, your solution will work nicely. Thanks, Jellby!
@Tex2002ans - Thanks for the input! I'm not quite following all the pieces of your search, though... particularly the highlighted part below. Code:
(“[^ ”]+\s[^”,]+)(”)
Last edited by ElMiko; 07-06-2012 at 04:54 AM. |
![]() |
![]() |
![]() |
#5 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Actually I made a slight mistake in that regex. Here is a better version: Code:
(“[^ ”]+\s[^”]+[^,!/?/.])(”) The right quotation is NEEDED in the green section. This means that after the first word, it will continue to grab everything UP TO the right double quotation. The characters in blue are OPTIONAL, and are there to say "if the quotation ends with this character, it is valid, so skip over this." In this case, it says if the blue character is a ',', '!', '?', or '.', the quote is valid. The Orange section just grabs the right quotation and makes it easy to do a Search and Replace. Code:
“My job is exhausting. My job is very exhausting! Did I mention that my job is extremely exhausting” Tom said laboriously. Code:
(“[^ ”]+\s[^”]+[^g])(”) Last edited by Tex2002ans; 07-06-2012 at 07:13 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
FBI charges Megaupload operators with piracy crimes | xg4bx | News | 291 | 05-10-2012 05:56 AM |
Better matching/scanning | lbutlr | Calibre | 3 | 08-04-2010 03:44 PM |
Matching Light for Kobo | dixieknits | Kobo Reader | 2 | 07-19-2010 02:50 AM |
(Development) What are these apparently-undefined python operators? | offby1 | Calibre | 5 | 06-26-2010 11:57 AM |
Literary Pattern Matching | kennyc | News | 5 | 12-16-2009 03:12 PM |