Quote:
Originally Posted by DNSB
@diagdealer: Thanks for the correction. I ran into the (?u) as being Unicode related playing with Python a while back and used ? for when I wanted ungreedy.
|
No problem. Easy enough assumption to make. All the short versions of the python regex flags align pretty well with the PCRE mode modifiers
except re.U.
re.I (?i) ignore case.
re.S (?s) single line
re.M (?m) multiline
re.U turns on the unicode behavior of {\w , \W , \b , \B} in Python, but (?U) puts repetition characters in
ungreedy mode in PCRE.
To turn on unicode support for operators like \w \W \d \b, etc in PCRE, you need to preface the expression with (*UCP). And that's if PCRE was compiled with unicode suport (which Sigil's PCRE clearly is).