View Single Post
Old 08-20-2011, 06:02 AM   #10
mightymouse2045
Enthusiast
mightymouse2045 began at the beginning.
 
Posts: 30
Karma: 10
Join Date: May 2011
Device: xoom
Quote:
Originally Posted by kacir View Post
OK.
Let's have a look at RE from my previous post, split across the lines for better readability
Code:
(?P<author>[^-]+) 
( 
        ( - | *-- *) 
        [[(]? 
        (?P<series>[^-]+) 
        [[( ]+ 
        (?P<series_index>[0-9.]+)? 
        [])]? 
)? 
( - | *-- *) 
(?P<title>.+)
Now, we shall rearrange various elements like so
Code:
(
        [[(]?
        (?P<series>[^-]+)
        [[( ]+
        (?P<series_index>[0-9.]+)? 
        [])]? 
        ( - | *-- *) 
)? 
(?P<author>[^-]+) 
( - | *-- *) 
(?P<title>.+)
Now it matches
series seriesnumber - author - title
author - title
please note, if there is Series, it must be followed by seriesnumber.
I think it is possible to construct RE to make seriesnumber optional, but I do not know it it would be useful that way, and my regular expressions is complicate enough as it is.

Let's add regular expression
[0-9 ]*
at the beginning of the new RE, so it "eats up" any numbers and spaces at the beginning
If there are dots in number, put this at the beginning instead
[0-9. ]*

--------- doesn't work ---------
Now, we need to put underscore among possible delimiters, together with ' - ', '--', ' -- '.
So instead of
( - | *-- *)
at the end of the series, we put
( - | *-- *| *_ *)
Now possible delimiters are ' - ', '--', ' -- ', '-- ', ' --', '_',' _','_ ',' _ '.
-------- end of doesn't work -------
The above construction doesn't work, because you would have to modify also (?P<series>[^-]+) to (?P<series>[^-_]+). Even bigger problem is that Calibre automatically replaces underscores in filenames with spaces. Is there an option to switch off that option?

I recommend to replace underscore with ' - ' in filenames before processing the file in Calibre.

Here is the result
Code:
[0-9 ]*([[(]?(?P<series>[^-]+)[[( ]+(?P<series_index>[0-9.]+)?[])]?( - | *-- *))?(?P<author>[^-]+)( - | *-- *)(?P<title>.+)
I will leave extensive testing of the regular expression as an exercise for the reader ;-)
Thanks alot for your explanation - that worked a treat I can now do that with anything else in future
mightymouse2045 is offline   Reply With Quote