I am big fan of Regular Expressions, and I have recently started to use Calibre for other things than for just a conversion now and then.
I will keep close eye on this thread.
Regular Expressions are very powerful stuff and deserve to be popularized a little bit more.
Originally Posted by Manichean
I don't see that in this example.
?, + and * are called quantifiers, because they quantify whatever lies before them.
The very first thing that a beginner needs to know about those standard quantifiers, you can see in any RE implementation is, that they are GREEDY.
Yes, there are also non-greedy quantifiers, as one of previous posters pointed out. In Python syntax those are *?, +?, ??.
Yes, there are *many* different syntaxes for Regular Expressions. I won't go further, I do not want to scare our dear readers away ;-)
' quantifier will eat as much of the string as it can.
Let's have an example. You have string
'AuthorFirstName AuthorLastName - series - title.epub
and you want to match 'AuthorFirstName AuthorLastName -
'. So, you write an expression like:
' to match Author. But! '.
' matches any character and '*
' quantifier takes as much as possible, so instead of matching 'AuthorFirstName AuthorLastName -
' as you have intended, you will match 'AuthorFirstName AuthorLastName - series -
You need to search for
' means match ANY character BUT '-
If the first character in a group is '^
' the rest of group is effectively a list of characters that are NOT supposed be matched.
I very, *very* strongly recommend THE best^H^H^H^Hmost exhaustive (pun intended) book ever written about Regular Expressions - Mastering Regular Expressions
- Book on regular expressions by Jeffrey Friedl
, published by O’Reilly.
Please see http://docs.python.org/library/re.html
for Recomandation about which version of book to use
The book is difficult, but worth its weight in gold if you want to understand Regular Expressions.