View Single Post
Old 03-01-2015, 09:30 PM   #1
JohnnyBook
Addict
JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.JohnnyBook holds these truths to be self-evident.
 
Posts: 200
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
Regex help on reading Metadata from file name.

90% of my files are in the format: (The other 10% do not include the pub date)

Format: Series-series number title (author) Pub date.txt
(Series 2-4 letters)
(Series number 2-4 digits)
(pub date-year only)

ex.

BA-123 How it works (John Smith) 1989.txt
ROT-4089 Make it this way (Jane Smith) 2009.txt

Playing around with the regex I was able to separate the series and number but I could not work out the title and author Typicaly I ended up with

Title: How it works (John Smith
Author: )

And pubdate does not work at all.

Unfortunetly I kept changing it around and now it does not work at all and I cant remember what I had that almost worked.

One thing I have had trouble with is the "(" and ")" and trying to search for them in the title. I CAN search and replace the titles to remove them to substitute them for another character to make it easier to run a regex if necessary (just not "-" as some titles have a "-" in them.

Anyone have any clue how to do this?

edit: This is as close as I can come to what I had
Code:
(?P<series>[^_0-9-]*)-(?P<series_index>[0-9]*)(?P<title>[^_-]+) \(?(?P<author>[^_].+) -?(?P<date>[^_].+) ?
JohnnyBook is offline   Reply With Quote