90% of my files are in the format: (The other 10% do not include the pub date)
Format: Series-series number title (author) Pub date.txt
(Series 2-4 letters)
(Series number 2-4 digits)
(pub date-year only)
ex.
BA-123 How it works (John Smith) 1989.txt
ROT-4089 Make it this way (Jane Smith) 2009.txt
Playing around with the regex I was able to separate the series and number but I could not work out the title and author Typicaly I ended up with
Title: How it works (John Smith
Author: )
And pubdate does not work at all.
Unfortunetly I kept changing it around and now it does not work at all and I cant remember what I had that almost worked.
One thing I have had trouble with is the "(" and ")" and trying to search for them in the title. I CAN search and replace the titles to remove them to substitute them for another character to make it easier to run a regex if necessary (just not "-" as some titles have a "-" in them.
Anyone have any clue how to do this?
edit: This is as close as I can come to what I had
Code:
(?P<series>[^_0-9-]*)-(?P<series_index>[0-9]*)(?P<title>[^_-]+) \(?(?P<author>[^_].+) -?(?P<date>[^_].+) ?