View Single Post
Old 02-22-2011, 03:19 PM   #9
meads
Member
meads began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2011
Device: none
Quote:
Originally Posted by cybmole View Post
I am wanting to understand how exactly this works:
(?P<author>.+) - (?P<title>[^_]+)

I have followed the link to the tutorial & looked at other regex syntax summaries but am still not getting it

I don't see a definition of ?P anywhere, for example

could someone kindly break down how the above code works, please.

the ( ) create a RegEx "group". A group can be referenced in a replace string. In a replacement string a group can be referenced by \1 or \2 in other words the order that the original ( ) defines each group. ALSO if you put a "name" in the ( ) like ?P<author> you can use the "named" in the replacement string!

So (?P<author>.+) - (?P<title>[^_]+) has 2 groups that are named. The first group is named "author", the period means any character will be part of the author group. The author group stops building up when the dash - is encountered. The second groups is created and named "title". the characters allowed are defined by the brackets [ ]. In this case, all characters are allowed in the group EXCEPT the underline character _ this is the ^ which says NO to including the _ so we get ^_ which means the group will NOT include any underlines. The + means that there must be at least ONE character for the group to be built.

The P in the (?P<group name>) is a special Python requirement to let Python know that a group is to have a name.

Two references I am using are:
Regular Expressions in 10 Minutes by Ben Forta
Regular Expression Pocket Reference by Tony Stubblebine

other books on RegEx:
Mastering Regular Expressions by Jeffrey E. F. Friedl
Regular Expressions Cookbook by Jan Goyvaerts & Steven Levithan
Beginning Regular Expressions by Andrew Watt

hope that helps.
meads is offline   Reply With Quote