I think re.UNICODE only causes \w & \W to match non-ascii characters, at least practically speaking. Which would be okay except that \w also includes numbers - if you're okay with matching numbers then \w+ should be ok.
I've always wished it would make [a-zA-Z] work the way capnm wants. I suppose you might be able to mix it with an digit exclusion lookahead:
(?u)(?=[^\d]+)(\w+)
But it's going to get tricky.
Last edited by ldolse; 12-01-2011 at 04:26 AM.
|