Thread: RegEx & Unicode
View Single Post
Old 12-01-2011, 04:23 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I think re.UNICODE only causes \w & \W to match non-ascii characters, at least practically speaking. Which would be okay except that \w also includes numbers - if you're okay with matching numbers then \w+ should be ok.

I've always wished it would make [a-zA-Z] work the way capnm wants. I suppose you might be able to mix it with an digit exclusion lookahead:

(?u)(?=[^\d]+)(\w+)

But it's going to get tricky.

Last edited by ldolse; 12-01-2011 at 04:26 AM.
ldolse is offline   Reply With Quote