Often, I find that OCR software omits final puncutation marks between the last letter of a sentence and a closing end-quote:
Code:
eg. “My job is exhausting” Tom said laboriously.
What I basically do is a regex search for all instances of a letter followed immediately by a closing quote. Unfortunately, this matches instances where a single word is being isolated by quotation marks:
Code:
eg. Please define the words “trustworthy” and “gullible”.
I'm hoping I can slightly reduce the number of false positives by excluding instances in which the closing quote is preceded by a single word, which is itself immediately preceded by a single open-quote. My idea was:
Code:
(?<!“[\p{L}]+)(?<=\p{L})”
However, it looks like character repetition is not allowed within lookahead & lookbehind expressions. Does anyone have any ideas?