View Single Post
Old 06-14-2012, 12:16 PM   #11
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 471
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by DiapDealer View Post
\b doesn't really "match" any characters—or more technically, its match is zero-length. It matches word boundaries. Which can be:

* Before the first character in the string, if the first character is a word character.
* After the last character in the string, if the last character is a word character.
* Between two characters in the string, where one is a word character and the other is not a word character.

A word character—without the (*UCP) flag—is [a-zA-Z0-9_] or \w

"There's"—for better or worse—is not one word in the eyes of regex. Because an apostrophe is not a word character. "There" would be one word and "s" would be another.

What are you wishing ’\b would find?
Again, thanks for the tutorial. Why is it that when an MR poster explains something it makes complete sense, but when i try to read an official Reg Ex tutorial i actually feel my brain cells dying and my life expectancy withering?

The scenario I'm trying to catch is instances in which OCR software interpreted a ” as a ’ . my guess is that the appropriate regex would be something like ’\b(?!\p{Ll}). I could also probably add a negative lookbehind to exclude common instances of the ’ functioning as a (plural) possessive or to denote an omitted character (maybe something like (?<!s|in)). Mostly my question was academic: just a a way for me to get a better understanding of how and why reg-ex behaves the way it does.

Last edited by ElMiko; 06-14-2012 at 12:18 PM.
ElMiko is offline   Reply With Quote