View Single Post
Old 08-19-2011, 01:02 PM   #9
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by mightymouse2045 View Post
How about

# title - author
# series ##_ title - author
series ##_ title - author

But strip the number # at the beginning?
OK.
Let's have a look at RE from my previous post, split across the lines for better readability
Code:
(?P<author>[^-]+) 
( 
        ( - | *-- *) 
        [[(]? 
        (?P<series>[^-]+) 
        [[( ]+ 
        (?P<series_index>[0-9.]+)? 
        [])]? 
)? 
( - | *-- *) 
(?P<title>.+)
Now, we shall rearrange various elements like so
Code:
(
        [[(]?
        (?P<series>[^-]+)
        [[( ]+
        (?P<series_index>[0-9.]+)? 
        [])]? 
        ( - | *-- *) 
)? 
(?P<author>[^-]+) 
( - | *-- *) 
(?P<title>.+)
Now it matches
series seriesnumber - author - title
author - title
please note, if there is Series, it must be followed by seriesnumber.
I think it is possible to construct RE to make seriesnumber optional, but I do not know it it would be useful that way, and my regular expressions is complicate enough as it is.

Let's add regular expression
[0-9 ]*
at the beginning of the new RE, so it "eats up" any numbers and spaces at the beginning
If there are dots in number, put this at the beginning instead
[0-9. ]*

--------- doesn't work ---------
Now, we need to put underscore among possible delimiters, together with ' - ', '--', ' -- '.
So instead of
( - | *-- *)
at the end of the series, we put
( - | *-- *| *_ *)
Now possible delimiters are ' - ', '--', ' -- ', '-- ', ' --', '_',' _','_ ',' _ '.
-------- end of doesn't work -------
The above construction doesn't work, because you would have to modify also (?P<series>[^-]+) to (?P<series>[^-_]+). Even bigger problem is that Calibre automatically replaces underscores in filenames with spaces. Is there an option to switch off that option?

I recommend to replace underscore with ' - ' in filenames before processing the file in Calibre.

Here is the result
Code:
[0-9 ]*([[(]?(?P<series>[^-]+)[[( ]+(?P<series_index>[0-9.]+)?[])]?( - | *-- *))?(?P<author>[^-]+)( - | *-- *)(?P<title>.+)
I will leave extensive testing of the regular expression as an exercise for the reader ;-)
kacir is offline   Reply With Quote