Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 05-16-2011, 04:16 AM   #1
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
Thumbs up Creating metadata pre/post-import

Hello all!

I'm a new user intending to use calibre with my new Kindle DX. I have a large library of PDFs from academic journals that I would like to pull into Calibre. The naming format for most of my files is "Author(s)[, et al] YYYY - Title.pdf"

I already tried importing once, realized calibre tried to read PDF metadata, which was pretty ugly, so I blew it away and am trying again restricting it to read metadata from filenames. There seem to be several ways create the metadata.

1) Create the metadata on import using Preferences→Add/Save. I read in another thread that only the five fields that appear in that dialog can be imported this way, so I can't pull in publication year, which is included in my naming scheme. I can at least prevent the year from appearing in the author field with
Code:
(?P<author>.+) \d\d\d\d - (?P<title>[^_]+)
. I haven't figured out how to eliminate the ", et al" that appears in some filenames. If I try
Code:
(?P<author>.+),et al \d\d\d\d - (?P<title>[^_]+)
I end up with no matches for the author field.

2) Import the entire filename as the document title, then try to parse the fields using the bulk metadata edit. However, when I open the "Edit metadata in bulk" dialog, I can't relate what I see to what I'm reading in that documentation link. I'm running 0.6.42 because that's the version in the Ubuntu Lucid repository. Is this functionality only available in a later version? If not, can someone help me figure out what to do?

3) Since I'm experienced with SQL, I'm thinking about importing the entire filename as the document title, then using SQL commands to pull out the year, authors, eliminate ", et al", etc.

Anyway, any comments on your experiences, and preferred way to accomplish this, would be appreciated.

--Lee
leehach is offline   Reply With Quote
Old 05-16-2011, 04:31 AM   #2
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by leehach View Post
I'm running 0.6.42 because that's the version in the Ubuntu Lucid repository. Is this functionality only available in a later version? If not, can someone help me figure out what to do?
The current version is 0.8.1, the functionality you want was added somewhere in 0.7.x. Uninstall the repository version and reinstall using the instructions here.
Manichean is offline   Reply With Quote
 
Advertisement
Old 05-16-2011, 04:49 PM   #3
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
Any advice on (a) a regular expression to weed out ", et al", and (b) choosing between parsing metadata during import vs. using the bulk metadata editor post-import? Regarding (b), it seems that the bulk editor is far more flexible.

I strongly prefer to use repositories to maintain my applications (including PPAs in addition to official repositories), but there's a big gap between 0.6.42 and 0.8.1, so it looks like I'll use the ad hoc install.

--Lee
leehach is offline   Reply With Quote
Old 05-16-2011, 04:53 PM   #4
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,305
Karma: 6022735
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by leehach View Post
Any advice on (a) a regular expression to weed out ", et al", and (b) choosing between parsing metadata during import vs. using the bulk metadata editor post-import? Regarding (b), it seems that the bulk editor is far more flexible.

I strongly prefer to use repositories to maintain my applications (including PPAs in addition to official repositories), but there's a big gap between 0.6.42 and 0.8.1, so it looks like I'll use the ad hoc install.

--Lee
Use the comandline install
I even set up a Menu Item in Ubuntu to do the grunt work (I still need to supply the SUDO PW and press enter 2 times )
theducks is offline   Reply With Quote
Old 05-16-2011, 05:52 PM   #5
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by leehach View Post
Any advice on (a) a regular expression to weed out ", et al"
Paste in a few filenames showing the variations you have (including with no et al.) and I'll have a look at it tomorrow.
Manichean is offline   Reply With Quote
Old 05-16-2011, 06:29 PM   #6
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
Yeah, I always manually create a menu item for installers that don't do it automatically. But does this mean calibre has to be run with sudo? I notice that the version installed from repository does not use sudo in the automatically created menu item. Why is this necessary?
leehach is offline   Reply With Quote
Old 05-16-2011, 06:41 PM   #7
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
Manichean, here are some example file names:

Chernick, et al 2011 - The Impact of the Great Recession and the Housing Crisis on the Financing of America's Largest Cities.pdf
Dalgaard 2008 - Introductory Statistics with R.pdf
Shrader-Frechette 2001 - MacIntyre on Human Rights.pdf
Theus & Lauer 1999 - Visualizing Loglinear Models.pdf
Verzani - SimpleR.pdf
1766_1103_CLT Reader Part I.pdf
Urban Stud-2007-Butler-1161-74.pdf

First title uses "et al". Subsequent ones include authors with hyphenated names, joint authors, and filenames with no publication date. The last two indicate that there's some stuff in my library which has some weird filename which probably came from the website I downloaded from.

I tested
Code:
(?P<author>[^_]+) (?P<published>\d\d\d\d) - (?P<title>.+)
using 0.8.1 and the published date came in nicely (though set to current month-day, e.g. 5/16, of the specified publication year, not a big deal). Authors with et al get an author_sort field of "al Author, et".

Thanks,
--Lee
leehach is offline   Reply With Quote
Old 05-17-2011, 05:29 AM   #8
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by leehach View Post
Yeah, I always manually create a menu item for installers that don't do it automatically. But does this mean calibre has to be run with sudo? I notice that the version installed from repository does not use sudo in the automatically created menu item. Why is this necessary?
I haven't run Calibre on Linux, but I believe you only need SU privileges when installing a new version.

Quote:
Originally Posted by leehach View Post
Chernick, et al 2011 - The Impact of the Great Recession and the Housing Crisis on the Financing of America's Largest Cities.pdf
Dalgaard 2008 - Introductory Statistics with R.pdf
Shrader-Frechette 2001 - MacIntyre on Human Rights.pdf
Theus & Lauer 1999 - Visualizing Loglinear Models.pdf
Verzani - SimpleR.pdf
1766_1103_CLT Reader Part I.pdf
Urban Stud-2007-Butler-1161-74.pdf
Try something like
Code:
(?P<author>[^_]+)(,? et al)? (?P<published>\d{4})? - (?P<title>.+)
That should at least give you the possible fields from all the files except the last two. I don't know what can be done about those without getting all horribly complicated, It'd probably be easier to just accept that you'll have to do a certain part manual work on your metadata.
Manichean is offline   Reply With Quote
Old 05-17-2011, 11:32 AM   #9
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,305
Karma: 6022735
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
You are correct.
The Calibre install only uses SUDO during the command line install (Out-of-date Repository versions, inherit the SUDO from The package manager (eg Synaptic)).

Calibre creates the Menu/desktop items as part of the install and sets the 'rights' for them to the user
theducks is offline   Reply With Quote
Old 05-17-2011, 12:16 PM   #10
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
Quote:
Originally Posted by Manichean View Post
I haven't run Calibre on Linux, but I believe you only need SU privileges when installing a new version.
Great, just installed 0.8.1. I'll test it and see how it goes.

Quote:
Originally Posted by Manichean View Post
Try something like
Code:
(?P<author>[^_]+)(,? et al)? (?P<published>\d{4})? - (?P<title>.+)
That should at least give you the possible fields from all the files except the last two. I don't know what can be done about those without getting all horribly complicated, It'd probably be easier to just accept that you'll have to do a certain part manual work on your metadata.
The et al group had no effect. This gave me an idea though. Both of the following work, does one or the other seem "safer" to you?
Code:
(?P<author>[^_]+)(,? ?e?t? ?a?l?) (?P<published>\d{4}) - (?P<title>[^_]+)
  -or-
(?P<author>[^_]+)(, et al|) (?P<published>\d{4}) - (?P<title>[^_]+)
These both successfully strip out ", et al", while not interfering when ", et al" was not present.

Meanwhile, I understand that some manual work will be necessary, just throwing in the last two filenames so you could see what I was working with. The above expression brings those in as author=Unknown and title=the complete filename, which is as good as I can expect.

Thanks,
--Lee
leehach is offline   Reply With Quote
Old 05-17-2011, 12:21 PM   #11
leehach
Member
leehach began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2011
Device: Kindle DX
I was wrong, neither of those expressions is stripping out ", et al".

--Lee
leehach is offline   Reply With Quote
Old 05-17-2011, 12:23 PM   #12
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by leehach View Post
Code:
(?P<author>[^_]+)(,? ?e?t? ?a?l?) (?P<published>\d{4}) - (?P<title>[^_]+)
  -or-
(?P<author>[^_]+)(, et al|) (?P<published>\d{4}) - (?P<title>[^_]+)
I'd use the second, albeit for purely aesthetical reasons.
Manichean is offline   Reply With Quote
Reply

Tags
metadata import

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Auto Download Metadata on Import ebookrights Calibre 2 12-18-2012 11:51 AM
Import failed Error:404 when attempting to import from Calibre to Stanza dvond Apple Devices 0 05-13-2011 04:00 PM
Mixing metadata on import PeteMan Calibre 2 01-03-2011 03:21 PM
Mass import of books metadata from an ASCII file: HELP NEEDED LARdT Calibre 4 07-08-2010 05:05 PM
Import: prioritization of metadata source? ATimson Calibre 2 02-28-2010 04:57 PM


All times are GMT -4. The time now is 02:43 PM.


MobileRead.com is a privately owned, operated and funded community.