Quote:
Originally Posted by DNSB
If you are dealing with Unicode and using Python2, ?U would be useful (it enables Unicode for various options and makes ignorecase use non-ASCII matching) Likely documented elsewhere but check 7.2. re — Regular expression operations for more information. Please note that it is not the same as ? and is not used the same way -- (?u) says to treat the pattern and input as Unicode so it modifies how the input and pattern are treated but is not part of those strings.
So something like (?u)(.*?) instead of (.*?) if you want to match on Unicode.
OTOH, I vaguely remember that Python3 matches on Unicode by default making (?u) and it's equivalents (re.U, re.UNICODE) obsolete.
|
Now I'm really confused. I thought
(?U) was a minimal match thing related to making something not greedy
This is in reference to the Minimal Match checkbox in Sigil's Find/Replace widget, and also to the
default Example Saved Search for promoting/demoting Headings which adds
(?sU) as a prefix:
Find: (?sU)<h2([^>]*>.*)</h2>
Replace: <h1\1</h1>
I'm trying to learn if an equivalent to the above Find might be:
(?s)<h2([^>]*>.*?)</h2>
If they're not equivalent, what's the difference and/or advantage to using
(?U) vs
.*? here?
Thank you