12-07-2009, 08:04 PM | #1 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Nov 2009
Device: None
|
Metadata from file name question
So I have a set of books and their names vary from stuff like "Farmer, Philip Jose - Riverworld 1 - To Your Scattered Bodies Go (.html.jpg v1.0)"
to "Forsyth, Kate - (Witches of Eileanan 2) - Pool of Two Moons (.rtf v0.9)" I want to add this stuff in bulk. I've currently got this setup (?P<author>.+) - (?P<series>.+) -(?P<title>[^_]+) and it does a decent job for both but gives a title like " To Your Scattered Bodies Go (.html.jpg v1" for the first and second examples. For the second example it gives a series like "(Witches of Eileanan 2)". Is there a way to avoid the "(.html.jpg v1" being added to the end of every title, and is there also a way to avoid the parenthesis being added to the series like in the second example? Also, can I have it automatch the series index as well? I can't seem to ever build a expression that returns anything meaningful using the "series_index" variable Much appreciate the help. |
12-08-2009, 01:02 AM | #2 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
I've lost the exact topic location, but read a MobileRead topic named "Tyrannosaurus Regex". I know I posted something last year in there about doing some automated input.
However, ultimately, nothing will work for you completely. You've got files which appear to be named using many different formulas. So you'll need many different filters and import selectively. |
Advert | |
|
12-08-2009, 01:30 AM | #3 | |
Enthusiast
Posts: 49
Karma: 10
Join Date: Nov 2009
Device: None
|
Quote:
The only thing I really want out of it now is for it to ignore the crap after the title in the file names. The stuff like "(.html.jpg v1.0)" Is there a way to have that expression specifically ignore stuff in parenthesis when it's trying to add stuff for the title? For reference here is the old post. Edit, and here is the expression PHP Code:
Last edited by Nitrousoxide; 12-08-2009 at 01:45 AM. |
|
12-08-2009, 11:59 AM | #4 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Nov 2009
Device: None
|
Well good news for me. I managed to fix the problem with it adding in stuff like "(.html.jpg v1" to the end of the title. I just added in "(?P<publishdate>\()" to the end of the expression that was in the Tyrannosarus Regex thread so that all the stuff about formats would be thrown into the publishdate metadata, and since it's not formatted at all correctly for that, it just gets thrown out entirely.
The ONLY thing I need to fix now is how it adds a "(" to the beginning of the series if the file name has the series written like "Forsyth, Kate - (Witches of Eileanan 2) - Pool of Two Moons (.rtf v0.9)." Right now it gives an output like "(Witches of Eileanan" I'm not sure why it drops the second parentheses but if I can get it to drop the first as well I should have an expression that should work for almost every book I'm trying to add. As it stands now my expression looks like this: Code:
(?P<author>((?!\s-\s).)*)\s-(?:\s((?P<series>.+) (?P<series_index>\d+)((?!\s-\s).)*)\s-)?\s(?P<title>.*) (?P<publishdate>\() |
12-08-2009, 12:07 PM | #5 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Stick a \({0,1} in front of the series expression
|
Advert | |
|
12-09-2009, 11:50 AM | #6 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Nov 2009
Device: None
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
metadata.calibre file on device | kilianto | Calibre | 10 | 08-10-2011 04:32 PM |
Saving metadata to file? | al35 | Calibre | 20 | 07-23-2010 11:02 AM |
How Do I Push Metadata into a Book File? | HamsterRage | Calibre | 1 | 06-29-2010 01:21 AM |
My Metadata file seems corrupt | gandor62 | Calibre | 3 | 03-27-2010 08:40 PM |
Metadata updated in file versus in DB | ATimson | Calibre | 1 | 02-21-2010 09:15 PM |