MobileRead Forums - View Single Post

ldolse · 12-01-2011, 05:23 AM

I think re.UNICODE only causes \w & \W to match non-ascii characters, at least practically speaking. Which would be okay except that \w also includes numbers - if you're okay with matching numbers then \w+ should be ok.

I've always wished it would make [a-zA-Z] work the way capnm wants. I suppose you might be able to mix it with an digit exclusion lookahead:

(?u)(?=[^\d]+)(\w+)

But it's going to get tricky.

12-01-2011, 05:23 AM	#9
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	I think re.UNICODE only causes \w & \W to match non-ascii characters, at least practically speaking. Which would be okay except that \w also includes numbers - if you're okay with matching numbers then \w+ should be ok. I've always wished it would make [a-zA-Z] work the way capnm wants. I suppose you might be able to mix it with an digit exclusion lookahead: (?u)(?=[^\d]+)(\w+) But it's going to get tricky. Last edited by ldolse; 12-01-2011 at 05:26 AM.