10-26-2015, 07:41 AM | #226 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Thanks for the reports. I'll give it another couple of days before I arrange for the release.
|
11-17-2015, 12:56 PM | #227 |
Junior Member
Posts: 5
Karma: 250
Join Date: May 2014
Device: Nook HD
|
Thank you very much for this. I've been searching off and on for a few weeks to find out why I was no longer getting comments from Goodreads. When I ask Calibre to check for updates it told me there were none available for the Goodreads plugin (not the Goodreads sync one). I found davidfor's file and it fixed the problem. Yay!
|
Advert | |
|
11-20-2015, 03:38 AM | #228 |
Junior Member
Posts: 3
Karma: 10
Join Date: Nov 2015
Device: calibro
|
I have problem with goodreads plugin. I can't download metadata from goodreads. When I try to open book's link through calibre I have error 403. I'm waiting a little It's opening or when I try to download mutliple books metadata I can download 4-5 books metadata and It's giving same error. I'm waiting a little and then I can download 4-5 books metadata.
|
11-20-2015, 04:43 AM | #229 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
A 403 error is "Forbidden". Unfortunately, I don't know what is forbidden about the request. But, as you had the error in the browser (clicking the link in calibre) and then in calibre, it isn't something specific to calibre. The only thing I can think of is that goodreads is seeing to many requests from your IP and temporarily blocking them.
When it next happens, can you post the log? |
11-20-2015, 04:45 AM | #230 | |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
|
|
Advert | |
|
11-20-2015, 05:02 AM | #231 | |
Junior Member
Posts: 3
Karma: 10
Join Date: Nov 2015
Device: calibro
|
Quote:
|
|
11-20-2015, 05:45 AM | #232 | |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Are you downloading metadata for one book at a time, or using the bulk download? If you are using the bulk download, can you try the single book? |
|
11-20-2015, 06:04 AM | #233 |
Junior Member
Posts: 3
Karma: 10
Join Date: Nov 2015
Device: calibro
|
when I click link I'm getting this page. It means 'Access denied to the Web page. You are not authorized.You may need to login.'
I try everything one book, bulk download, re*download calibre, re-download plugin, change browser, change my goodreads user settings. When I click to link sometimes link begining with http://, sometimes https:// I think I can't solve this problem |
11-20-2015, 06:51 AM | #234 | |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
If the error is happening in the browser, it probably means it is something in your connection. That error page doesn't look like it comes from Goodreads. Do you have a proxy between you and the web that you need to login to? Can you reach the Goodreads home page? Is using HTTPS different to HTTP? |
|
11-21-2015, 08:21 AM | #235 |
Zealot
Posts: 137
Karma: 2156958
Join Date: Jan 2013
Device: Too many random androids to list
|
This (the 403s) have been happening for the last week, since Friday 13 November. It's a bit of a heisenbug, so I hadn't reported it. In any case here's what I have figured out:
If you send any batch of requests ~>5 at a time, you will probably hit this. Single requests don't ever seem to hit it on first pull of a specific book. The bigger batch you send, the more you'll get: If you request 20 books at a time, in 3 batches, queued into the job manager, you'll probably get all or most of the books in the first batch, 10-12 in the second batch, and none by the third. At this point, all requests to GR fail for the next while - I didn't yet narrow down how long, but it's at least 15 minutes. Once any specific book has failed, that book id will also pull a 403 in a browser. I've checked multiple browsers. Adding any character to the end of the url (a - will do, gr url's on the site usually include part of the title, but their webserver is actually only responding to the book id, so any text after the book id will work) and it'll work, so it's only the bare book id that fails, it's interpreting any variant as a separate, different request. All this leads me to believe it's probably something to do with rate-limiting. As I said, given enough time, any bare book ID url will work again too in the browser, and once it does, you can again pull that book from the plugin. Just to be super clear what I mean here: Bare book id url's, for books that were exhibiting this, but naturally aren't now: http://www.goodreads.com/book/show/6902644 https://www.goodreads.com/book/show/45634 https://www.goodreads.com/book/show/553907 http vs https doesn't seem to matter, I specifically tried both. Once it's blocked it's apparently ip blocking me, because I actually tried one of those url's from my tablet on wifi behind the same router as this pc, so same external ip address) and at the same time from my phone via 3g. Tablet failed, phone worked. ETA: this is specific book id by book id. When one specific one starts failing with 403's, the rest of the site still works, but as I mentioned internal url's on GR are not bare id's, they always include some other text. The fact it starts to decline *all* metadata requests from the plugin after it's failed enough times, is interesting, again implying rate limiting, since it's affecting either only bare book id urls, or only api requests. adding a - or any other character to those same url's, during the period they were blocked with a 403 response, and they worked fine. This is really quite hard to debug, since it's so inconsistent, but at the same time it's also quite repeatable, if you throw enough books at GR. Last edited by Krazykiwi; 11-21-2015 at 08:26 AM. Reason: Last friday wasn't the 14th, doh |
11-22-2015, 09:36 AM | #236 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.1.0 Released
Changes in this release:
Thanks to davidfor for making the changes. |
12-12-2015, 09:08 PM | #237 |
Member
Posts: 21
Karma: 104
Join Date: Oct 2013
Device: none
|
As noted by Krazykiwi, you can get 403 errors if you try to download "bare" goodreads book urls too many times. To investigate this issue further, I looked into the details of how the Goodreads Metadata Source Plugin works.
When you first try to download metadata for a book that doesn't have a "goodreads:" (or "isbn:") entry in the identifiers field, the plugin does a goodreads search and then parses the HTML response to get the first matching book's url. This url is not a bare url so it shouldn't trigger the 403 error. The next time you try to download metadata for that book, it will now have a "goodreads:" identifier, won't do a search, and attempts to get metadata by just directly downloading www.goodreads.com/book/show/{TheGoodreadsID} (see __init__.py lines 114-115). I speculate that this problem is more noticeable because the Description/Comments/Summary metadata broke recently and a new plugin version was required. So more people have been re-downloading Goodreads metadata for books that already have a "goodreads:" identifier. You can fix this problem by changing the identify() method in __init__.py, line 115, to automatically do what Krazykiwi was doing manually. Just add a trailing "-" to the url as in the following: Code:
if goodreads_id: matches.append('%s/book/show/%s-' % (Goodreads.BASE_URL, goodreads_id)) Code:
result_url = Goodreads.BASE_URL + first_result_url_node[0] Code:
log.info('First search results book url: %s' % result_url) The plugin does not use the Goodreads API but is instead scraping the book's html page so it's not limited to 1 request per second. I'm not sure why it's so slow (a custom C# metadata downloader I wrote can grab 500+ books in a few minutes)? I didn't bother to figure this out though since it would probably unfairly load down the goodreads servers. |
12-13-2015, 01:39 AM | #238 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
I doubt adding the dash to the end of the URL will really help. I think it is more likely that when the blocking is happening, the Goodreads site thinks this is different URL. I would expect that it could get blocked and then the URL without the dash would work. Or you would need two dashes. You could automate this, try it and if there was a 403, add the dash and try that. I don't like that as there is reason that Goodreads has blocked the URL and we probably should not sidestep that
But, are people still seeing this problem? At the time it happened, there was another problem with getting related books. I was wondering if both were caused by a bad update to Goodreads. |
12-13-2015, 12:19 PM | #239 | |
Member
Posts: 21
Karma: 104
Join Date: Oct 2013
Device: none
|
Quote:
To test your theory, I just downloaded metadata for 288 books that already had a "goodreads:" id using my patch. It took 17m:34s (3.7s per book) with no 403 errors but I forget to turn off the Amazon metadata plugin. Redoing again with just the Goodreads plugin took 6m:24s (1.3s per book) but failed for 2 books but they were "No matches found with query" errors. I then removed my patch, and downloaded the metadata for the same 288 books. It took 3m:38s (0.76s per book) but only successfully downloaded metadata for 57 books, and failed for 231 (with 229 "httperror_seek_wrapper: HTTP Error 403: Forbidden" errors). This matches the initial behavior that caused me to investigate the problem in the first place. Just to be sure I put my patch back in, redownloaded (6m:20s, 1.3s per book), and again successfully got metadata for all the books for which metadata exists. So it seems your theory is wrong? |
|
12-15-2015, 12:46 PM | #240 |
Zealot
Posts: 137
Karma: 2156958
Join Date: Jan 2013
Device: Too many random androids to list
|
Could you make it add any random character at the end? a %s.%random-alpha-char% or %s-%randomalphachar ('scuse my lack of python-fu, but that ought to be a fairly simple little function even if it's not a built-in right?) That would make every request be treated as "the first time".
|
Tags |
goodreads, metadata |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Goodreads Sync | kiwidude | Plugins | 1721 | 04-18-2024 10:22 AM |
[Metadata Download Plugin] Goodreads Metadata **Deprecated** | kiwidude | Plugins | 30 | 04-23-2011 02:10 PM |
[Covers Plugin] Goodreads Covers **Deprecated** | kiwidude | Plugins | 13 | 04-17-2011 05:09 PM |
metadata plugin | redneck_momma | Plugins | 1 | 05-21-2010 08:41 PM |