Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-07-2009, 09:04 PM   #1
Nitrousoxide
Enthusiast
Nitrousoxide began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Nov 2009
Device: None
Metadata from file name question

So I have a set of books and their names vary from stuff like "Farmer, Philip Jose - Riverworld 1 - To Your Scattered Bodies Go (.html.jpg v1.0)"

to

"Forsyth, Kate - (Witches of Eileanan 2) - Pool of Two Moons (.rtf v0.9)"

I want to add this stuff in bulk. I've currently got this setup

(?P<author>.+) - (?P<series>.+) -(?P<title>[^_]+)

and it does a decent job for both but gives a title like " To Your Scattered Bodies Go (.html.jpg v1" for the first and second examples. For the second example it gives a series like "(Witches of Eileanan 2)". Is there a way to avoid the "(.html.jpg v1" being added to the end of every title, and is there also a way to avoid the parenthesis being added to the series like in the second example?

Also, can I have it automatch the series index as well? I can't seem to ever build a expression that returns anything meaningful using the "series_index" variable

Much appreciate the help.
Nitrousoxide is offline   Reply With Quote
Old 12-08-2009, 02:02 AM   #2
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 630
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
I've lost the exact topic location, but read a MobileRead topic named "Tyrannosaurus Regex". I know I posted something last year in there about doing some automated input.

However, ultimately, nothing will work for you completely. You've got files which appear to be named using many different formulas. So you'll need many different filters and import selectively.
Sabardeyn is offline   Reply With Quote
Old 12-08-2009, 02:30 AM   #3
Nitrousoxide
Enthusiast
Nitrousoxide began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Nov 2009
Device: None
Quote:
Originally Posted by Sabardeyn View Post
I've lost the exact topic location, but read a MobileRead topic named "Tyrannosaurus Regex". I know I posted something last year in there about doing some automated input.

However, ultimately, nothing will work for you completely. You've got files which appear to be named using many different formulas. So you'll need many different filters and import selectively.
Well the python code in the old topic does a pretty darn good job. It even distinguishes between titles that don't have a series in the filename.

The only thing I really want out of it now is for it to ignore the crap after the title in the file names. The stuff like "(.html.jpg v1.0)"

Is there a way to have that expression specifically ignore stuff in parenthesis when it's trying to add stuff for the title?

For reference here is the old post.

Edit, and here is the expression
PHP Code:
(?P<author>((?!\s-\s).)*)\s-(?:\s((?P<series>.+) (?P<series_index>\d+)((?!\s-\s).)*)\s-)?\s(?P<title>.*) 

Last edited by Nitrousoxide; 12-08-2009 at 02:45 AM.
Nitrousoxide is offline   Reply With Quote
Old 12-08-2009, 12:59 PM   #4
Nitrousoxide
Enthusiast
Nitrousoxide began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Nov 2009
Device: None
Well good news for me. I managed to fix the problem with it adding in stuff like "(.html.jpg v1" to the end of the title. I just added in "(?P<publishdate>\()" to the end of the expression that was in the Tyrannosarus Regex thread so that all the stuff about formats would be thrown into the publishdate metadata, and since it's not formatted at all correctly for that, it just gets thrown out entirely.

The ONLY thing I need to fix now is how it adds a "(" to the beginning of the series if the file name has the series written like "Forsyth, Kate - (Witches of Eileanan 2) - Pool of Two Moons (.rtf v0.9)." Right now it gives an output like "(Witches of Eileanan"

I'm not sure why it drops the second parentheses but if I can get it to drop the first as well I should have an expression that should work for almost every book I'm trying to add.

As it stands now my expression looks like this:
Code:
(?P<author>((?!\s-\s).)*)\s-(?:\s((?P<series>.+) (?P<series_index>\d+)((?!\s-\s).)*)\s-)?\s(?P<title>.*) (?P<publishdate>\()
Anyone have any suggestions to fix that last problem?
Nitrousoxide is offline   Reply With Quote
Old 12-08-2009, 01:07 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,436
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Stick a \({0,1} in front of the series expression
kovidgoyal is online now   Reply With Quote
Old 12-09-2009, 12:50 PM   #6
Nitrousoxide
Enthusiast
Nitrousoxide began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Nov 2009
Device: None
Quote:
Originally Posted by kovidgoyal View Post
Stick a \({0,1} in front of the series expression
Thanks! Worked like a charm!
Nitrousoxide is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
metadata.calibre file on device kilianto Calibre 10 08-10-2011 05:32 PM
Saving metadata to file? al35 Calibre 20 07-23-2010 12:02 PM
How Do I Push Metadata into a Book File? HamsterRage Calibre 1 06-29-2010 02:21 AM
My Metadata file seems corrupt gandor62 Calibre 3 03-27-2010 09:40 PM
Metadata updated in file versus in DB ATimson Calibre 1 02-21-2010 10:15 PM


All times are GMT -4. The time now is 05:38 AM.


MobileRead.com is a privately owned, operated and funded community.