MobileRead Forums - View Single Post

kacir · 08-19-2011, 01:02 PM

Quote:

Originally Posted by mightymouse2045

How about

# title - author
# series ##_ title - author
series ##_ title - author

But strip the number # at the beginning?

OK.
Let's have a look at RE from my previous post, split across the lines for better readability

Code:

(?P<author>[^-]+) 
( 
        ( - | *-- *) 
        [[(]? 
        (?P<series>[^-]+) 
        [[( ]+ 
        (?P<series_index>[0-9.]+)? 
        [])]? 
)? 
( - | *-- *) 
(?P<title>.+)

Now, we shall rearrange various elements like so

Code:

(
        [[(]?
        (?P<series>[^-]+)
        [[( ]+
        (?P<series_index>[0-9.]+)? 
        [])]? 
        ( - | *-- *) 
)? 
(?P<author>[^-]+) 
( - | *-- *) 
(?P<title>.+)

Now it matches
series seriesnumber - author - title
author - title
please note, if there is Series, it must be followed by seriesnumber.
I think it is possible to construct RE to make seriesnumber optional, but I do not know it it would be useful that way, and my regular expressions is complicate enough as it is.

Let's add regular expression
[0-9 ]*
at the beginning of the new RE, so it "eats up" any numbers and spaces at the beginning
If there are dots in number, put this at the beginning instead
[0-9. ]*

--------- doesn't work ---------
Now, we need to put underscore among possible delimiters, together with ' - ', '--', ' -- '.
So instead of
( - | *-- *)
at the end of the series, we put
( - | *-- *| *_ *)
Now possible delimiters are ' - ', '--', ' -- ', '-- ', ' --', '_',' _','_ ',' _ '.
-------- end of doesn't work -------
The above construction doesn't work, because you would have to modify also (?P<series>[^-]+) to (?P<series>[^-_]+). Even bigger problem is that Calibre automatically replaces underscores in filenames with spaces. Is there an option to switch off that option?

I recommend to replace underscore with ' - ' in filenames before processing the file in Calibre.

Here is the result

Code:

[0-9 ]*([[(]?(?P<series>[^-]+)[[( ]+(?P<series_index>[0-9.]+)?[])]?( - | *-- *))?(?P<author>[^-]+)( - | *-- *)(?P<title>.+)

I will leave extensive testing of the regular expression as an exercise for the reader ;-)