Getting Metadata from Amazon

LJJohnson · 12-20-2016, 02:23 PM

I get most of my of books from Amazon, and they seem to have the best selection (for ebooks) of metadata. By best I mean not only the data, but being able to use the initial data to do a global search & replace (for example, all the various forms of Science Fiction to "SFF__". Some parts of their data is just silly, and I periodically do search-and-delete on it [such as "Two Hours or More (65-100 Pages)," and it's cousins] I do a lot of editing on metadata, and have a lot of custom tags, but in many cases I'm dependent, at least initially, on the original Amazon data as a starting point.

However, the metadata for IDs that is returned (i.e., mobi-asin:B01M33A032) has a very poor record of finding the correct book from Amazon when I hit [Download Metadata] ==> considerably less than 50%. It will bring back the wrong book, no books, or the book in dead-tree version, which normally has little metadata.

However, if I manually put in "Amazon:B01M33A032" in the IDs field, my success rate goes to 80% - 90%.

So, is there some script or add-in available to do this automatically on a selected subset of books? With 500+ new books from the recent Open Roads giveaway, I really don't want to do this manually. My goal is to select the 500+ books, run a script to add the "Amazon:B0xxxxxxxx" automagically, the use [CTRL-D] to download metadata only and get the correct Amazon ebook metadata most of the time.

Thanks.

kovidgoyal · 12-20-2016, 09:14 PM

Metadata download does not use mobi-asin as there is no way to know which country store a mobi-asin corresponds to. You can use the search and replace feature of te bulk metadata edit dialog to mass convert the mobi-asin: to amazon: identifiers before running the download.

nabsltd · 12-23-2016, 12:33 AM

Quote:

Originally Posted by LJJohnson

However, the metadata for IDs that is returned (i.e., mobi-asin:B01M33A032) has a very poor record of finding the correct book from Amazon when I hit [Download Metadata] ==> considerably less than 50%. It will bring back the wrong book, no books, or the book in dead-tree version, which normally has little metadata.

The "Overdrive Link" plugin is pretty much 100% accurate at grabbing the ASIN of an eBook from Amazon, and it will put it in any identifier you want, with the default being "amazon:".

audeojude · 12-26-2016, 07:00 PM

The last few days I have been having really spoty results downloading metadata from amazon. One out of 10 books will actually download. These are books that I just downloaded from my amazon account via computer and then added to calibre. They do not bring any metadata such as comments or tags with them however the books are good and I can read them with the book viewer. I then hit control D after selecting all books that I downloaded. Typically it just errors out on all of them. Sometimes I can pick a specific book and hit e for edit and then download metadata from there. More often it isn't working there either.

maybe later I will try again and it works. I read this thread and enabled the overdrive plugin and one of three books I was trying to download metadata just moments before actually worked but the other two did not. For something that was working well just days ago all the time, it is really frustrating. I'm thinking some of it is on amazons end as I am getting this error sometimes. Not sure that it is a valid message about to many books as this time I was only trying to get metadata for three books. I can change my IP address and it will still give that error. It's just frustrating.

get_details failed for url: 'https://www.amazon.com/Darker-Element-Beyond-Godhunter-Book-ebook/dp/B00VH4AQIY/ref=sr_1_2/163-9040836-8902326?s=books&ie=UTF8&qid=1482796511&sr=1-2'
Traceback (most recent call last):
File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 297, in run
File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 310, in get_details
File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 315, in parse_details
CaptchaError: Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again.

************************************************** ******************************

Katja_hbg · 12-27-2016, 05:53 AM

Quote:

Originally Posted by kovidgoyal

Metadata download does not use mobi-asin as there is no way to know which country store a mobi-asin corresponds to. You can use the search and replace feature of te bulk metadata edit dialog to mass convert the mobi-asin: to amazon: identifiers before running the download.

I tried that. Simple search/replace did not work. With field identifier it is ADDING amazon:nnn (proper number). This works perfect with German location.
Thanks for the hint.

My only question is whether this is really the intention to have the identifier double i.e. mobi-asin and amazon (instead replace).

Rev. Bob · 12-28-2016, 11:41 AM

Quote:

Originally Posted by audeojude

The last few days I have been having really spoty results downloading metadata from amazon. One out of 10 books will actually download. These are books that I just downloaded from my amazon account via computer and then added to calibre.

I'm getting this, too. If I select several books and try to grab their metadata all at once, I might get half of them successfully, and often those are from non-Amazon sources like Google. On the other hand, if I do 'em one by one, the success rate shoots up to 100% or close to it.

To be clear, these are all books I bought on Amazon US as Kindle-format ebooks, and I've used the "convert mobi-asin to amazon identifier" trick. Amazon should be giving metadata on all of them, but something about the bulk download attempt isn't working.

SteveB23 · 12-28-2016, 12:25 PM

I just started seeing this again as well. I saw it first back in July and there was some discussion then about the use of hard-coded vs random user-agents (https://www.mobileread.com/forums/sho...d.php?t=276443). But I installed the 2.75 Calibre update a few days ago and noticed this behavior again yesterday. The trick of changing the ID tag from mobi-asin:XXXX to amazon:XXXX seems to fix it, but I thought I'd bring up the user-agent issue and ask if that might also be involved.

kovidgoyal · 12-28-2016, 12:59 PM

THere have been no changes to the user agent in any recent release.

Helmut G · 12-30-2016, 10:22 AM

Quote:

Originally Posted by kovidgoyal

Metadata download does not use mobi-asin as there is no way to know which country store a mobi-asin corresponds to. You can use the search and replace feature of te bulk metadata edit dialog to mass convert the mobi-asin: to amazon: identifiers before running the download.

This is undoutably true. However, as most users who buy ebooks from amazon do this via one country-specific portal, it would be handy to be able to use this for inital metadata retrieval. After initial retrieval, the correct country-specific amazon identifier will be set correctly.
Of course this should be handled as an option to the amazon module defaulting to false.
After playing around with this I came up with the following changes to amazon.py.
I know there are different ways to work around this, but this is really easy to use (at least if most of your ebooks come from a single amazon portal that is available for selection in your amazon.py module.)
Would you be willing to consider adding something like that to you upstream module?

Code:

--- a/src/calibre/ebooks/metadata/sources/amazon.py
+++ b/src/calibre/ebooks/metadata/sources/amazon.py
@@ -793,6 +793,9 @@ class Amazon(Source):
             Option('domain', 'choices', 'com', _('Amazon website to use:'),
                 _('Metadata from Amazon will be fetched using this '
                     'country\'s Amazon website.'), choices=AMAZON_DOMAINS),
+           Option('use_mobi_asin', 'bool', False,
+               _('use ebook-internal mobi-asin to match eBook'),
+               _('Match eBook on selected Amazon site using the mobi-asin identifier contained in most Amazon eBooks')),
             )

     def __init__(self, *args, **kwargs):
@@ -837,6 +840,8 @@ class Amazon(Source):
             key = key.lower()
             if key in ('amazon', 'asin'):
                 return 'com', val
+            if (self.prefs['use_mobi_asin'] and key in ('mobi-asin')):
+                return self.prefs['domain'], val
             if key.startswith('amazon_'):
                 domain = key.partition('_')[-1]
                 if domain and (domain in self.AMAZON_DOMAINS or domain in extra_domains):

kovidgoyal · 12-30-2016, 12:22 PM

Sure, I have no objection to adding such an option.

audeojude · 02-15-2017, 10:44 AM

Update on this for me. A few days after my last post on this it cleared up and worked fine. Without me doing anything. About 4 days ago it started doing it again. The error message for amazon indicates a captcha field blocking the query.

****************************** Amazon.com ******************************
Request extra headers: [('User-agent', u'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko')]
Found 0 results
Downloading from Amazon.com took 0.408433914185
Plugin Amazon.com failed
Traceback (most recent call last):
File "site-packages/calibre/ebooks/metadata/sources/identify.py", line 48, in run
File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 1163, in identify
File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 1073, in parse_results_page
CaptchaError: Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again.

************************************************** ******************************

I find the message confusing based on number of books. I have been able to do 100 to 200 books at the time with no issues other than individual books it was unable to identify. Other times a single book will give this message. I have thought maybe it had something to do with sometimes running through my vpn service that could have thousands of other users using that ip and to many of us collectively are doing similar requests. However I can disconnect from the vpn and get identical results/errors.

The error makes me think it is a amazon security feature causing the problem. However it seemed to start and end not long after I respectively updated calibre in both instances. Take this feeling with a huge grain of salt though as it's only worth the digital ink im printing it with

kovidgoyal · 02-15-2017, 11:15 AM

It is an amazon ant-bot measure and it uses statistical techniques -- so it is impossible to predict/understand its behavior.

jeffls · 07-02-2017, 12:43 PM

I have searched these forums and still haven't seen a solution for the metadata tag download issues from Amazon. I can see the tags on the web page for the book, and if I experiment using the Amazon Product API, I can retrieve them (though this is less than ideal and took a fair amount of registration and setup to get going).

There are no errors in the metadata download log, but just no tags either. This get's fairly annoying as it makes adding medium-to-large sets of books very difficult and time-consuming. Is anyone else having this problem or is it just me?

kovidgoyal · 07-02-2017, 01:23 PM

The plugin does not support reading tgs, IIRC, tags are loaded on the website using javascript.

jeffls · 07-02-2017, 05:44 PM

Ah, that makes perfect sense.
I've been looking through the docs and code available for plugins, but haven't figured out yet how do a simple one that would take the book selection and execute my python code that utilizes my amazon api login information. It's tough looking through all the plugins to find one that's a good starting point.

12-20-2016, 02:23 PM	#1
LJJohnson Groupie Posts: 156 Karma: 511136 Join Date: May 2013 Location: -- Home is where the RV stops (Texas ~6 months/year) Device: Kindle Fire HDX, Fire HD, Paperwhite,Android & Windpws phones	Getting Metadata from Amazon I get most of my of books from Amazon, and they seem to have the best selection (for ebooks) of metadata. By best I mean not only the data, but being able to use the initial data to do a global search & replace (for example, all the various forms of Science Fiction to "SFF__". Some parts of their data is just silly, and I periodically do search-and-delete on it [such as "Two Hours or More (65-100 Pages)," and it's cousins] I do a lot of editing on metadata, and have a lot of custom tags, but in many cases I'm dependent, at least initially, on the original Amazon data as a starting point. However, the metadata for IDs that is returned (i.e., mobi-asin:B01M33A032) has a very poor record of finding the correct book from Amazon when I hit [Download Metadata] ==> considerably less than 50%. It will bring back the wrong book, no books, or the book in dead-tree version, which normally has little metadata. However, if I manually put in "Amazon:B01M33A032" in the IDs field, my success rate goes to 80% - 90%. So, is there some script or add-in available to do this automatically on a selected subset of books? With 500+ new books from the recent Open Roads giveaway, I really don't want to do this manually. My goal is to select the 500+ books, run a script to add the "Amazon:B0xxxxxxxx" automagically, the use [CTRL-D] to download metadata only and get the correct Amazon ebook metadata most of the time. Thanks.

07-02-2017, 05:44 PM	#15
jeffls Junior Member Posts: 4 Karma: 10 Join Date: Jul 2017 Device: android	Thanks.... would like to make my own plugin I guess Ah, that makes perfect sense. I've been looking through the docs and code available for plugins, but haven't figured out yet how do a simple one that would take the book selection and execute my python code that utilizes my amazon api login information. It's tough looking through all the plugins to find one that's a good starting point.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can't get Metadata from Amazon	Ginnia	Calibre	37	02-20-2012 10:11 AM
Amazon metadata: Just me or down for everyone?	CWatkinsNash	Calibre	7	02-03-2012 10:05 PM
unable to change Amazon source for metadata to amazon UK	callwing	Library Management	0	09-09-2011 10:41 AM
metadata from amazon errors	kevinrs	Calibre	1	05-09-2011 11:09 AM
Amazon metadata and covers?	desertgrandma	Devices	13	02-19-2011 07:28 PM

12-20-2016, 09:14 PM	#2
kovidgoyal creator of calibre Posts: 43,844 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Metadata download does not use mobi-asin as there is no way to know which country store a mobi-asin corresponds to. You can use the search and replace feature of te bulk metadata edit dialog to mass convert the mobi-asin: to amazon: identifiers before running the download.

12-26-2016, 07:00 PM	#4
audeojude Connoisseur Posts: 66 Karma: 14170 Join Date: Oct 2011 Device: kindle 1	The last few days I have been having really spoty results downloading metadata from amazon. One out of 10 books will actually download. These are books that I just downloaded from my amazon account via computer and then added to calibre. They do not bring any metadata such as comments or tags with them however the books are good and I can read them with the book viewer. I then hit control D after selecting all books that I downloaded. Typically it just errors out on all of them. Sometimes I can pick a specific book and hit e for edit and then download metadata from there. More often it isn't working there either. maybe later I will try again and it works. I read this thread and enabled the overdrive plugin and one of three books I was trying to download metadata just moments before actually worked but the other two did not. For something that was working well just days ago all the time, it is really frustrating. I'm thinking some of it is on amazons end as I am getting this error sometimes. Not sure that it is a valid message about to many books as this time I was only trying to get metadata for three books. I can change my IP address and it will still give that error. It's just frustrating. get_details failed for url: 'https://www.amazon.com/Darker-Element-Beyond-Godhunter-Book-ebook/dp/B00VH4AQIY/ref=sr_1_2/163-9040836-8902326?s=books&ie=UTF8&qid=1482796511&sr=1-2' Traceback (most recent call last): File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 297, in run File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 310, in get_details File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 315, in parse_details CaptchaError: Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again. ************************************************ ****************************

12-28-2016, 12:25 PM	#7
SteveB23 Junior Member Posts: 6 Karma: 10 Join Date: Jul 2016 Device: Kindle Apps (Desktop, Android phone & tablet)	I just started seeing this again as well. I saw it first back in July and there was some discussion then about the use of hard-coded vs random user-agents (https://www.mobileread.com/forums/sho...d.php?t=276443). But I installed the 2.75 Calibre update a few days ago and noticed this behavior again yesterday. The trick of changing the ID tag from mobi-asin:XXXX to amazon:XXXX seems to fix it, but I thought I'd bring up the user-agent issue and ask if that might also be involved.

12-28-2016, 12:59 PM	#8
kovidgoyal creator of calibre Posts: 43,844 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	THere have been no changes to the user agent in any recent release.

12-30-2016, 12:22 PM	#10
kovidgoyal creator of calibre Posts: 43,844 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Sure, I have no objection to adding such an option.

02-15-2017, 10:44 AM	#11
audeojude Connoisseur Posts: 66 Karma: 14170 Join Date: Oct 2011 Device: kindle 1	Update on this for me. A few days after my last post on this it cleared up and worked fine. Without me doing anything. About 4 days ago it started doing it again. The error message for amazon indicates a captcha field blocking the query. **************************** Amazon.com ************************** Request extra headers: [('User-agent', u'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko')] Found 0 results Downloading from Amazon.com took 0.408433914185 Plugin Amazon.com failed Traceback (most recent call last): File "site-packages/calibre/ebooks/metadata/sources/identify.py", line 48, in run File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 1163, in identify File "site-packages/calibre/ebooks/metadata/sources/amazon.py", line 1073, in parse_results_page CaptchaError: Amazon returned a CAPTCHA page, probably because you downloaded too many books. Wait for some time and try again. ********************************************** **************************** I find the message confusing based on number of books. I have been able to do 100 to 200 books at the time with no issues other than individual books it was unable to identify. Other times a single book will give this message. I have thought maybe it had something to do with sometimes running through my vpn service that could have thousands of other users using that ip and to many of us collectively are doing similar requests. However I can disconnect from the vpn and get identical results/errors. The error makes me think it is a amazon security feature causing the problem. However it seemed to start and end not long after I respectively updated calibre in both instances. Take this feeling with a huge grain of salt though as it's only worth the digital ink im printing it with

02-15-2017, 11:15 AM	#12
kovidgoyal creator of calibre Posts: 43,844 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It is an amazon ant-bot measure and it uses statistical techniques -- so it is impossible to predict/understand its behavior.

07-02-2017, 12:43 PM	#13
jeffls Junior Member Posts: 4 Karma: 10 Join Date: Jul 2017 Device: android	I have searched these forums and still haven't seen a solution for the metadata tag download issues from Amazon. I can see the tags on the web page for the book, and if I experiment using the Amazon Product API, I can retrieve them (though this is less than ideal and took a fair amount of registration and setup to get going). There are no errors in the metadata download log, but just no tags either. This get's fairly annoying as it makes adding medium-to-large sets of books very difficult and time-consuming. Is anyone else having this problem or is it just me?

07-02-2017, 01:23 PM	#14
kovidgoyal creator of calibre Posts: 43,844 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The plugin does not support reading tgs, IIRC, tags are loaded on the website using javascript.

Advert

Advert