View Single Post
Old 06-29-2008, 06:04 PM   #3
Alby
Junior Member
Alby began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2008
Device: Sony PRS 505
I actually had the same issue as you. I did the following:

Use a program called PDFInfo to remove the metadata (you may need to use a scripting tool like AutoIT to automate it) then I use the following regular expression:

(?P<authors>[^-]+)\s*-+\s*((?P<series>[^-\d]+)\s*(?P<series_index>\d)*\s*-+\s*)*(?P<title>.+)

Edit: (?P<authors>[^-]+)\s*-+\s*((?P<series>[^-]+)\s*-+\s*)*(?P<title>.+) works without the error of the series_index not being there all the time

It has support for the Authors name, followed by the - separator then an optional Series name & optional Series_Index, followed by an optional separator (tied to the Series option) and finally the Title.

I am trying to work out why sometimes it fails when there is no series_index, but I suspect that once you define a ?P<id> it MUST be used and not be make optional using a *. I may force it picking up a space (\s), unfortunately not all filenames contain a space next to the separator (-) and I much prefer using an optional space between any of the fields (\s*)

If this doesn't end up working, I may end up using Pyton/Perl/VBScript to split the file name up and use PDFInfo to populate the Metadata fields instead, unfortunately there is no MetaData field for Series or Series_Index (at least using PDFInfo) so I may need to find another tool that allows me to add fields and hope Calibre can map these fields to Series & Series_Index.

P.S. I realise you may not be using PDF's, but thats all that applicable for me, but the regular expression may still end up working with a bit of tweaking.

Last edited by Alby; 06-29-2008 at 10:02 PM. Reason: Typos & Extra Information
Alby is offline   Reply With Quote