View Single Post
Old 07-25-2010, 06:45 PM   #1
dloyer4
Junior Member
dloyer4 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2010
Device: none
Regular Expression Help Needed

Importing books and I need to write a regular expression to handle the three file-name formats they're in. The file name formats are as follows:

1. author - title.type
2. author - title - series.type
3. author - title - series - series_index.type

Each category is separated by a hyphen with a space on either side. The regex needs to be able to handle the occasional hyphenated word in the title, but those hyphens are not preceded or followed by a space.

So far, I have the following regex:
(?P<author>[^-]+) - (?P<title>[^-]+) - (?P<series>[^-]+) - (?P<series_index>[^.]+)?

This works fine for case 3...though it seems to put a .0 at the end of the series number - i.e. book #3 in a series is given the series_index of 3.0 when I test the file name, but I don't know if this is a problem with my regex, or if this is simply how Calibre displays that number.

For case 1, The title is listed as "author - title" and the rest of the fields are unknown, and for case 2, the title is listed as "author - title - series" and the rest of the fields are listed as unknown.

Any one able to give me a hand with this?

Thanks,
Dennis
dloyer4 is offline   Reply With Quote