Kacir,
Quote:
(?P<author>[^-]+)( - \[?(?P<series>[^-]+)(\[| )+(?P<series_index>[0-9]+)\]?)? - (?P<title>.+)
|
Thank you for this expression, which proved very useful to me. I'd like to take you up on your offer to explain it in detail. I'm very new to this game, so please excuse in advance my ignorance.
This is what I make of it:
(?P<author>[^-]+)
matches any string of characters except the character
- and make that string the 'author' field.
I don't get
[^-]. Wouldn't that eliminate a possible hyphen from the name? Why not simply use
(?P<author>.+), like you do for the title?
( - \[?(?P<series>[^-]+)(\[| )+(?P<series_index>[0-9]+)\]?)?
The whole expression between the first and the last parenthesis is followed by a question mark. Does this question mark mean that the whole expression can either not appear or appear once, thus letting us process two types of books (those with a series and those without)?
Again, I don't get the
[^-].
I'm also not sure about the
+ in
(\[| )+. Is it to process the possibility of an erroneous duplication of either a left bracket or a white space before the series index?
- (?P<title>.+)
This part seems clear enough: after the last whitespace hyphen whitespace sequence, all characters are the title. However, since
[^-] was used for the author and the series, why not use it here as well?
A clarification would be most welcome. Thanks! W.