View Single Post
Old 09-20-2010, 07:20 AM   #10
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 2,735
Karma: 2899223
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Great work.
I am big fan of Regular Expressions, and I have recently started to use Calibre for other things than for just a conversion now and then.

I will keep close eye on this thread.
Regular Expressions are very powerful stuff and deserve to be popularized a little bit more.

Quote:
Originally Posted by Manichean View Post
I don't see that in this example.
?, + and * are called quantifiers, because they quantify whatever lies before them.
The very first thing that a beginner needs to know about those standard quantifiers, you can see in any RE implementation is, that they are GREEDY.
Yes, there are also non-greedy quantifiers, as one of previous posters pointed out. In Python syntax those are *?, +?, ??.
Yes, there are *many* different syntaxes for Regular Expressions. I won't go further, I do not want to scare our dear readers away ;-)

A '*' quantifier will eat as much of the string as it can.
Let's have an example. You have string
'AuthorFirstName AuthorLastName - series - title.epub'
and you want to match 'AuthorFirstName AuthorLastName - '. So, you write an expression like:
'.* - ' to match Author. But! '.' matches any character and '*' quantifier takes as much as possible, so instead of matching 'AuthorFirstName AuthorLastName - ' as you have intended, you will match 'AuthorFirstName AuthorLastName - series - '

You need to search for
'[^-]* - '
'[^-]' means match ANY character BUT '-'

If the first character in a group is '^' the rest of group is effectively a list of characters that are NOT supposed be matched.


I very, *very* strongly recommend THE best^H^H^H^Hmost exhaustive (pun intended) book ever written about Regular Expressions - Mastering Regular Expressions - Book on regular expressions by Jeffrey Friedl, published by O’Reilly.
Please see http://docs.python.org/library/re.html for Recomandation about which version of book to use
The book is difficult, but worth its weight in gold if you want to understand Regular Expressions.
kacir is offline