Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-13-2009, 07:31 AM   #1
ChristianQ
Junior Member
ChristianQ began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2009
Device: DR1000S
[Old Thread] Extract ISBN from file name

I've got hundreds of e-books name as ISBN.pdf
How do I get set the "Regular expression" in the "Adding books" page of Preferences?

Thank you.
ChristianQ is offline   Reply With Quote
Old 11-04-2010, 09:55 AM   #2
grandin
Junior Member
grandin began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: Kindle 3
Hi,

Did you ever figure out a good regex for this? I have a ton of filenames that begins with the ISBN, followed by publisher and title. I doubt I'll be able to parse the publisher out of the title, but I figure if I can at least grab the ISBN metadata, I can pull down all the rest.

Thanks,

G
grandin is offline   Reply With Quote
Advert
Old 11-04-2010, 10:04 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by grandin View Post
Hi,

Did you ever figure out a good regex for this? I have a ton of filenames that begins with the ISBN, followed by publisher and title. I doubt I'll be able to parse the publisher out of the title, but I figure if I can at least grab the ISBN metadata, I can pull down all the rest.

Thanks,

G
Post a few sample filenames, and someone will help you with a regex. Make sure you select samples that show each format you have for the titles. There's no reason you can't parse out the publisher, but if you have good isbn numbers, and if those numbers are in the databases, you'll write over the publisher, title and author when you fetch metadata.
Starson17 is offline   Reply With Quote
Old 11-04-2010, 10:16 AM   #4
grandin
Junior Member
grandin began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: Kindle 3
Thanks, Starson.

I figured that the good ISBN would allow me to recupe the relevant title, author, and publisher data, whether or not it was already in the filename. I'm willing to try for that brute force method before going so far as to parse the other elements by a regex.

Here's a couple of filenames:
0262083558.The.MIT.Press.Ham.Radios.Technical.Cult ure.Dec.2006.pdf
0520233085.University.of.California.Press.The.Hors e.and.Jockey.from.Artemision.A.Bronze.Equestrian.M onument.of.the.Hellenistic.Period.Jul.2004.pdf
041530329X.Routledge.Politics.The.Basics.Jul.2004. pdf

Mostly academic titles, all from the same source.

Many thanks to whoever can lend a hand.
grandin is offline   Reply With Quote
Old 11-04-2010, 10:38 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by grandin View Post
Thanks, Starson.

I figured that the good ISBN would allow me to recupe the relevant title, author, and publisher data, whether or not it was already in the filename. I'm willing to try for that brute force method before going so far as to parse the other elements by a regex.

Here's a couple of filenames:
0262083558.The.MIT.Press.Ham.Radios.Technical.Cult ure.Dec.2006.pdf
0520233085.University.of.California.Press.The.Hors e.and.Jockey.from.Artemision.A.Bronze.Equestrian.M onument.of.the.Hellenistic.Period.Jul.2004.pdf
041530329X.Routledge.Politics.The.Basics.Jul.2004. pdf

Mostly academic titles, all from the same source.

Many thanks to whoever can lend a hand.
Try this:
Code:
(?P<isbn>.+?)\.(?P<title>.+)
You can't easily get a correct title or publisher given the format in the filenames. Just let it overwrite during a bulk metadata fetch.
Starson17 is offline   Reply With Quote
Advert
Old 11-08-2010, 08:40 AM   #6
vne
Member
vne began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2010
Device: none
I suggest using another software called "ISBN renamer" to change file name to be ISBN.pdf. After that, just proceed with Calibre. "ISBN renamer" reads xx first page of the book to find ISBN (xx depends on you) and then seeks info from Amazon to rename the book.
vne is offline   Reply With Quote
Old 11-08-2010, 09:44 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by vne View Post
I suggest using another software called "ISBN renamer" to change file name to be ISBN.pdf. After that, just proceed with Calibre. "ISBN renamer" reads xx first page of the book to find ISBN (xx depends on you) and then seeks info from Amazon to rename the book.
There's no need to use any other software. He's already got the ISBN in the filename, so he doesn't need to read any pages to find it. Using the regex I posted, it will bring the ISBN number into Calibre, and he can then automatically find author/title/publisher/ratings/etc. from Amazon and other sites with a bulk metadata fetch.
Starson17 is offline   Reply With Quote
Old 11-08-2010, 10:38 AM   #8
vne
Member
vne began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2010
Device: none
Quote:
Originally Posted by Starson17 View Post
There's no need to use any other software. He's already got the ISBN in the filename, so he doesn't need to read any pages to find it. Using the regex I posted, it will bring the ISBN number into Calibre, and he can then automatically find author/title/publisher/ratings/etc. from Amazon and other sites with a bulk metadata fetch.
Yes, it works fine in this case.

I'm wondering how to fetch metadata from Amazon, it seems that only googlebooks and isbndb are available.
vne is offline   Reply With Quote
Old 11-08-2010, 10:42 AM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by vne View Post
I'm wondering how to fetch metadata from Amazon, it seems that only googlebooks and isbndb are available.
Whatever is available from Amazon is already being picked up via the Amazon plugin (unless you've turned it off). I did read that one of the sources (possibly Amazon?) has set strict limits recently on the number of fetches allowed.
Starson17 is offline   Reply With Quote
Old 11-08-2010, 10:53 AM   #10
vne
Member
vne began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2010
Device: none
Quote:
Originally Posted by Starson17 View Post
Whatever is available from Amazon is already being picked up via the Amazon plugin (unless you've turned it off). I did read that one of the sources (possibly Amazon?) has set strict limits recently on the number of fetches allowed.
I remember reading somewhere that Calibre doesn't read metadata from Amazon because there is a policy of Amazon that it is prohibited to copy metadata from this site without increasing its traffic (though I know at least two softwares still doing this one, the free one is ISBN renamer, the commercial one is Book collector).
vne is offline   Reply With Quote
Old 11-08-2010, 11:23 AM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by vne View Post
I remember reading somewhere that Calibre doesn't read metadata from Amazon
The 0.7.27 What's New says:
"Amazon metadata download plugin: Make it more robust and add option to auto convert HTML to text"

There's been an Amazon plugin forever. The interaction between the various metadata source plugins, and what comes from where, is not always obvious.
Starson17 is offline   Reply With Quote
Old 11-09-2010, 09:04 PM   #12
vne
Member
vne began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2010
Device: none
Quote:
Originally Posted by Starson17 View Post
The 0.7.27 What's New says:
"Amazon metadata download plugin: Make it more robust and add option to auto convert HTML to text"

There's been an Amazon plugin forever. The interaction between the various metadata source plugins, and what comes from where, is not always obvious.
I still don't think that Calibre reads metadata from Amazon, when fetching metadata from sever, Calibre says "Calibre can find metadata for your books from two locations: Google Books and isbndb.com".

Up to now, calibre can find the following parameters: title, author, tag, publisher, rating, series, published date. I think if calibre reads metadata from Amazon, then some more parameters will also be available, e.g. Number of pages, edition.

Last edited by vne; 11-09-2010 at 09:07 PM.
vne is offline   Reply With Quote
Old 11-12-2010, 09:38 AM   #13
frulex
Junior Member
frulex began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2010
Device: none
I'm having similar problems as Grandin above, I have a lot of books in this format (only ISBN):
041530329X.pdf
9780521874878.pdf

but I can't get Calibre (0.7.27) to pick up ISBN from filename. I tried the already mentioned suggestions (?P<isbn>.+?)\.(?P<title>.+), removing the check from "Read metadata..." but still no luck...

Any other ideas?
tnx...
frulex is offline   Reply With Quote
Old 11-12-2010, 10:11 AM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by frulex View Post
I'm having similar problems as Grandin above, I have a lot of books in this format (only ISBN):
041530329X.pdf
9780521874878.pdf

but I can't get Calibre (0.7.27) to pick up ISBN from filename. I tried the already mentioned suggestions (?P<isbn>.+?)\.(?P<title>.+), removing the check from "Read metadata..." but still no luck...

Any other ideas?
tnx...
Your files aren't named the same as his. He always had a dot after the ISBN, followed by the title. Normally, I'd say to try this:
(?P<isbn>.+)

However, IIRC, Calibre is not happy if you don't give it a title or author in the regex. One solution is to just add the books, then select all, use Edit Metadata (bulk) and the Search and Replace Feature to copy the ISBN from title into the ISBN field.
Starson17 is offline   Reply With Quote
Old 11-12-2010, 10:32 AM   #15
grandin
Junior Member
grandin began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: Kindle 3
Thanks!

Starson17 - Thanks for the regex! Worked perfectly.

Frulex - Starson17's suggestion should work just fine, but one suggestion: make sure than under "adding books" the option to check file metadata is not ticked - this way the info be pulled directly from the filename with no margin of error. After that, download the metadata in a batch and you'll be right as rain.

vne - I appreciate the ISBN renamer tip. That will come in very handy with another pack I have, where the books are in .txt file, filename=title, but author is the folder name! Thus far has been impossible to fix, and as I don't know any scripting languages this tool might be the best way.

Cheers all around!
grandin is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Extract ISBN kiwidude Plugins 502 03-25-2024 06:40 AM
Extract ISBN from PDF? mdroberts Calibre 14 12-16-2016 07:32 AM
[Old Thread] Bulk ISBN Removal brewjono Calibre 8 05-04-2011 06:15 PM
[Old Thread] Auto Extract ISBN-Feature request UnraisedArc Calibre 60 03-23-2011 09:31 AM
[Old Thread] ISBN in List view muppetgeoff Library Management 6 02-15-2011 08:35 PM


All times are GMT -4. The time now is 04:08 AM.


MobileRead.com is a privately owned, operated and funded community.