MobileRead Forums - View Single Post - Creating metadata pre/post-import

leehach · 05-16-2011, 03:16 AM

Hello all!

I'm a new user intending to use calibre with my new Kindle DX. I have a large library of PDFs from academic journals that I would like to pull into Calibre. The naming format for most of my files is "Author(s)[, et al] YYYY - Title.pdf"

I already tried importing once, realized calibre tried to read PDF metadata, which was pretty ugly, so I blew it away and am trying again restricting it to read metadata from filenames. There seem to be several ways create the metadata.

1) Create the metadata on import using Preferences→Add/Save. I read in another thread that only the five fields that appear in that dialog can be imported this way, so I can't pull in publication year, which is included in my naming scheme. I can at least prevent the year from appearing in the author field with

Code:

(?P<author>.+) \d\d\d\d - (?P<title>[^_]+)

. I haven't figured out how to eliminate the ", et al" that appears in some filenames. If I try

Code:

(?P<author>.+),et al \d\d\d\d - (?P<title>[^_]+)

I end up with no matches for the author field.

2) Import the entire filename as the document title, then try to parse the fields using the bulk metadata edit. However, when I open the "Edit metadata in bulk" dialog, I can't relate what I see to what I'm reading in that documentation link. I'm running 0.6.42 because that's the version in the Ubuntu Lucid repository. Is this functionality only available in a later version? If not, can someone help me figure out what to do?

3) Since I'm experienced with SQL, I'm thinking about importing the entire filename as the document title, then using SQL commands to pull out the year, authors, eliminate ", et al", etc.

Anyway, any comments on your experiences, and preferred way to accomplish this, would be appreciated.

--Lee

05-16-2011, 03:16 AM	#1
leehach Member Posts: 10 Karma: 10 Join Date: May 2011 Device: Kindle DX	Creating metadata pre/post-import Hello all! I'm a new user intending to use calibre with my new Kindle DX. I have a large library of PDFs from academic journals that I would like to pull into Calibre. The naming format for most of my files is "Author(s)[, et al] YYYY - Title.pdf" I already tried importing once, realized calibre tried to read PDF metadata, which was pretty ugly, so I blew it away and am trying again restricting it to read metadata from filenames. There seem to be several ways create the metadata. 1) Create the metadata on import using Preferences→Add/Save. I read in another thread that only the five fields that appear in that dialog can be imported this way, so I can't pull in publication year, which is included in my naming scheme. I can at least prevent the year from appearing in the author field with Code: (?P<author>.+) \d\d\d\d - (?P<title>[^_]+) . I haven't figured out how to eliminate the ", et al" that appears in some filenames. If I try Code: (?P<author>.+),et al \d\d\d\d - (?P<title>[^_]+) I end up with no matches for the author field. 2) Import the entire filename as the document title, then try to parse the fields using the bulk metadata edit. However, when I open the "Edit metadata in bulk" dialog, I can't relate what I see to what I'm reading in that documentation link. I'm running 0.6.42 because that's the version in the Ubuntu Lucid repository. Is this functionality only available in a later version? If not, can someone help me figure out what to do? 3) Since I'm experienced with SQL, I'm thinking about importing the entire filename as the document title, then using SQL commands to pull out the year, authors, eliminate ", et al", etc. Anyway, any comments on your experiences, and preferred way to accomplish this, would be appreciated. --Lee