Great work.
I am big fan of Regular Expressions, and I have recently started to use Calibre for other things than for just a conversion now and then.
I will keep close eye on this thread.
Regular Expressions are very powerful stuff and deserve to be popularized a little bit more.
Quote:
Originally Posted by Manichean
I don't see that in this example.
|
?, + and * are called quantifiers, because they quantify whatever lies before them.
The very first thing that a beginner needs to know about those standard quantifiers, you can see in any RE implementation is, that they are GREEDY.
Yes, there are also non-greedy quantifiers, as one of previous posters pointed out. In Python syntax those are *?, +?, ??.
Yes, there are *many* different syntaxes for Regular Expressions. I won't go further, I do not want to scare our dear readers away ;-)
A '
*' quantifier will eat as much of the string as it can.
Let's have an example. You have string
'
AuthorFirstName AuthorLastName - series - title.epub'
and you want to match '
AuthorFirstName AuthorLastName - '. So, you write an expression like:
'
.* - ' to match Author. But! '
.' matches any character and '
*' quantifier takes as much as possible, so instead of matching '
AuthorFirstName AuthorLastName - ' as you have intended, you will match '
AuthorFirstName AuthorLastName - series - '
You need to search for
'
[^-]* - '
'
[^-]' means match ANY character BUT '
-'
If the first character in a group is '
^' the rest of group is effectively a list of characters that are NOT supposed be matched.
I very, *very* strongly recommend THE best^H^H^H^Hmost exhaustive (pun intended) book ever written about Regular Expressions -
Mastering Regular Expressions - Book on regular expressions by
Jeffrey Friedl, published by O’Reilly.
Please see
http://docs.python.org/library/re.html for Recomandation about which version of book to use
The book is difficult, but worth its weight in gold if you want to understand Regular Expressions.