MobileRead Forums - View Single Post

DiapDealer · 06-13-2012, 02:54 PM

Turn on the unicode properties (*UCP) so \b becomes unicode-aware. It's seeing those characters as non-word boundaries of some sort, otherwise.

Code:

(*UCP)\s(?=([st]|re|ve|ll)\b)

I used this text as a test case:

Code:

<p>a séance töten don t</p>
<p>don tyou see sheriff s</p>
<p>we ll I'll be a mönkey s uncle</p>

06-13-2012, 02:54 PM	#2
DiapDealer Grand Sorcerer Posts: 28,837 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Turn on the unicode properties (UCP) so \b becomes unicode-aware. It's seeing those characters as non-word boundaries of some sort, otherwise. Code: (UCP)\s(?=([st]\|re\|ve\|ll)\b) I used this text as a test case: Code: <p>a séance töten don t</p> <p>don tyou see sheriff s</p> <p>we ll I'll be a mönkey s uncle</p>