Hello
I’m unable to get my RegEx to add books to Calibre, and will be very grateful for assistance.
Using Excel and DOS I’ve renamed all the files to have the following format:
<author> _ <title> (<series> <series_index>).<ext> eg ‘
Lee Child _ Without Fail (Jack Reacher 06).epub’.
ie AUTHOR and TITLE are separated by “ _ ” ie SPACE UNDERSCORE SPACE.
SERIES and SERIES_INDEX are separated by “ ” ie SPACE, and are enclosed within brackets “( … )”.
AUTHOR and TITLE are mandatory.
SERIES and SERIES_INDEX are optional, and SERIES may occur without SERIES_INDEX.
The filename may also end (ie before .ext) with a string in square brackets and preceded by a space eg
<author> _ <title> [HTML].<ext>. It may be used with compressed files (.zip, .rar) to show the file type of their contents eg
Alice Byron - The Bombmaker [html, jpg].rar. This information is not added to Calibre.
I want a RegEx to capture AUTHOR, TITLE, SERIES and SERIES_INDEX.
I think the 6 examples in the following table cover all possibilities:
Filename Series # HTML
1 Ace Atkins _ Infamous.epub - - -
2 Alice Byron _ The Bombmaker [html].rar - - Yes
3 Brian Adams _ A Disaster (Pee Wee).lit Yes - -
4 Beth Brown _ Bombs Away (Ace Bly) [html, jpg].rar Yes - Yes
5 Chad Altman _ Noon Today (Jay Wells 04).epub Yes Yes -
6 Chloe Beck _ Lullaby Town (Adam Eve 03) [html].rar Yes Yes Yes
In trying to understand regular expressions I wrote the attached PythonCodingInCalibre_v0-1.doc, but I’m missing something.
(?P<author>[^-]+) - (?P<title>[^[]+) [(](?P<series>.*) (?P<series_index>\d*)[)] works with files 5 and 6 (in the above table) but not files 1-4,though I had to change the AUTHOR/TITLE separator from an underscore to a hyphen as I couldn’t get the underscore to work. Is there something special about the underscore?
Regards, David