![]() |
#1 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
![]()
Hello all!
I'm a new user intending to use calibre with my new Kindle DX. I have a large library of PDFs from academic journals that I would like to pull into Calibre. The naming format for most of my files is "Author(s)[, et al] YYYY - Title.pdf" I already tried importing once, realized calibre tried to read PDF metadata, which was pretty ugly, so I blew it away and am trying again restricting it to read metadata from filenames. There seem to be several ways create the metadata. 1) Create the metadata on import using Preferences→Add/Save. I read in another thread that only the five fields that appear in that dialog can be imported this way, so I can't pull in publication year, which is included in my naming scheme. I can at least prevent the year from appearing in the author field with Code:
(?P<author>.+) \d\d\d\d - (?P<title>[^_]+) Code:
(?P<author>.+),et al \d\d\d\d - (?P<title>[^_]+) 2) Import the entire filename as the document title, then try to parse the fields using the bulk metadata edit. However, when I open the "Edit metadata in bulk" dialog, I can't relate what I see to what I'm reading in that documentation link. I'm running 0.6.42 because that's the version in the Ubuntu Lucid repository. Is this functionality only available in a later version? If not, can someone help me figure out what to do? 3) Since I'm experienced with SQL, I'm thinking about importing the entire filename as the document title, then using SQL commands to pull out the year, authors, eliminate ", et al", etc. Anyway, any comments on your experiences, and preferred way to accomplish this, would be appreciated. --Lee |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
Any advice on (a) a regular expression to weed out ", et al", and (b) choosing between parsing metadata during import vs. using the bulk metadata editor post-import? Regarding (b), it seems that the bulk editor is far more flexible.
I strongly prefer to use repositories to maintain my applications (including PPAs in addition to official repositories), but there's a big gap between 0.6.42 and 0.8.1, so it looks like I'll use the ad hoc install. --Lee |
![]() |
![]() |
![]() |
#4 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,925
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() I even set up a Menu Item in Ubuntu to do the grunt work (I still need to supply the SUDO PW and press enter 2 times ![]() |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
Yeah, I always manually create a menu item for installers that don't do it automatically. But does this mean calibre has to be run with sudo? I notice that the version installed from repository does not use sudo in the automatically created menu item. Why is this necessary?
|
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
Manichean, here are some example file names:
Chernick, et al 2011 - The Impact of the Great Recession and the Housing Crisis on the Financing of America's Largest Cities.pdf Dalgaard 2008 - Introductory Statistics with R.pdf Shrader-Frechette 2001 - MacIntyre on Human Rights.pdf Theus & Lauer 1999 - Visualizing Loglinear Models.pdf Verzani - SimpleR.pdf 1766_1103_CLT Reader Part I.pdf Urban Stud-2007-Butler-1161-74.pdf First title uses "et al". Subsequent ones include authors with hyphenated names, joint authors, and filenames with no publication date. The last two indicate that there's some stuff in my library which has some weird filename which probably came from the website I downloaded from. I tested Code:
(?P<author>[^_]+) (?P<published>\d\d\d\d) - (?P<title>.+) Thanks, --Lee |
![]() |
![]() |
![]() |
#8 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
Quote:
Code:
(?P<author>[^_]+)(,? et al)? (?P<published>\d{4})? - (?P<title>.+) |
||
![]() |
![]() |
![]() |
#9 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,925
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You are correct.
The Calibre install only uses SUDO during the command line install (Out-of-date Repository versions, inherit the SUDO from The package manager (eg Synaptic)). Calibre creates the Menu/desktop items as part of the install and sets the 'rights' for them to the user ![]() |
![]() |
![]() |
![]() |
#10 | ||
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
Quote:
Quote:
Code:
(?P<author>[^_]+)(,? ?e?t? ?a?l?) (?P<published>\d{4}) - (?P<title>[^_]+) -or- (?P<author>[^_]+)(, et al|) (?P<published>\d{4}) - (?P<title>[^_]+) Meanwhile, I understand that some manual work will be necessary, just throwing in the last two filenames so you could see what I was working with. The above expression brings those in as author=Unknown and title=the complete filename, which is as good as I can expect. Thanks, --Lee |
||
![]() |
![]() |
![]() |
#11 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
|
I was wrong, neither of those expressions is stripping out ", et al".
--Lee |
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
|
![]() |
![]() |
![]() |
Tags |
metadata import |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Auto Download Metadata on Import | ebookrights | Calibre | 2 | 12-18-2012 10:51 AM |
Import failed Error:404 when attempting to import from Calibre to Stanza | dvond | Apple Devices | 0 | 05-13-2011 03:00 PM |
Mixing metadata on import | PeteMan | Calibre | 2 | 01-03-2011 02:21 PM |
Mass import of books metadata from an ASCII file: HELP NEEDED | LARdT | Calibre | 4 | 07-08-2010 04:05 PM |
Import: prioritization of metadata source? | ATimson | Calibre | 2 | 02-28-2010 03:57 PM |