Quote:
Originally Posted by ldolse
I think re.UNICODE only causes \w & \W to match non-ascii characters, at least practically speaking. Which would be okay except that \w also includes numbers - if you're okay with matching numbers then \w+ should be ok.
I've always wished it would make [a-zA-Z] work the way capnm wants. I suppose you might be able to mix it with an digit exclusion lookahead:
(?u)(?=[^\d]+)(\w+)
But it's going to get tricky.
|
And what I really want is some form of (?u)[a-z] or (?u)[A-Z] to work, but I think I'm out of luck on that one.
I played/poked around a bit and here's what I found (which may even be correct):
This flavor of python regex supports (?u), which makes \w, \d, \b unicode aware.
It doesn't support \unnnn or \Unnnnnnnn.
It doesn't support upper/lower properties or character classes.
Revising your lookahead idea, I think this will emulate a unicode aware [a-zA-Z]
(?u)\w(?!(?<=[\d_]))
but that doesn't solve my wish ...
Oh, well. This was supposed to be a quick exercise in tweaking some template code. Now I'm just being stubborn
Since I don't forsee any great inspiration on how to make a unicode [a-z], I'll probably settle for adding [à-ÿ] to at least make it Latin-1 aware ...