Thread: Regex examples
View Single Post
Old 08-05-2019, 01:15 AM   #593
odamizu
just an egg
odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.
 
odamizu's Avatar
 
Posts: 1,840
Karma: 8006346
Join Date: Mar 2015
Device: Kindle, iOS
Quote:
Originally Posted by DNSB View Post
If you are dealing with Unicode and using Python2, ?U would be useful (it enables Unicode for various options and makes ignorecase use non-ASCII matching) Likely documented elsewhere but check 7.2. re — Regular expression operations for more information. Please note that it is not the same as ? and is not used the same way -- (?u) says to treat the pattern and input as Unicode so it modifies how the input and pattern are treated but is not part of those strings.

So something like (?u)(.*?) instead of (.*?) if you want to match on Unicode.

OTOH, I vaguely remember that Python3 matches on Unicode by default making (?u) and it's equivalents (re.U, re.UNICODE) obsolete.
Now I'm really confused. I thought (?U) was a minimal match thing related to making something not greedy

This is in reference to the Minimal Match checkbox in Sigil's Find/Replace widget, and also to the default Example Saved Search for promoting/demoting Headings which adds (?sU) as a prefix:

Find: (?sU)<h2([^>]*>.*)</h2>
Replace: <h1\1</h1>

I'm trying to learn if an equivalent to the above Find might be:

(?s)<h2([^>]*>.*?)</h2>

If they're not equivalent, what's the difference and/or advantage to using (?U) vs .*? here?

Thank you

Last edited by odamizu; 08-06-2019 at 01:06 AM. Reason: correction: example saved search, not default
odamizu is offline   Reply With Quote