OK, I have figured out a regex that does what I need except that it doesn't do the date right. I have the dates as just the 4 digit year in the file names but they get extracted as YEAR-02-15. So the 4 digit year is correct but it makes every document have a month/day of 02/15. How can I stop the day month from being added?
regex I am using:
^(?P<title>([^_\(]+)(\w+)) \((?P<author>[^\,]+)\, (?P<publisher>[^\,]+)\, (?P<published>[^\)]+)
|