In order to support non-ascii with \w+, you need to add the unicode flag to the regex - (?u) IIRC. Kovid's solution of \S+ is something that works across all regex implementations though, no need to teach/confuse the user about flags which are python specific.
Locale might make a difference, but I'm not really sure on that point... The unicode flag is generally what's used for this issue.
|