Quote:
Originally Posted by odamizu
Another regex question if you will indulge me:
What's the difference between (?U) and ?
Is there an advantage to using (?U).* rather than .*? or vice versa?
Thank you!
|
If you are dealing with Unicode and using Python2, ?U would be useful (it enables Unicode for various options and makes ignorecase use non-ASCII matching) Likely documented elsewhere but check
7.2. re — Regular expression operations for more information. Please note that it is not the same as ? and is not used the same way -- (?u) says to treat the pattern and input as Unicode so it modifies how the input and pattern are treated but is not part of those strings.
So something like (?u)(.*?) instead of (.*?) if you want to match on Unicode.
OTOH, I vaguely remember that Python3 matches on Unicode by default making (?u) and it's equivalents (re.U, re.UNICODE) obsolete.