|
|
#1 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Timeout when fetching metada
My Kobo Books metadata source plugin stopped recently. The report for it in the plugin's thread is here. From the error, it is a timeout. The logs I see when I do this is:
calibre, version 5.31.1 ERROR: No matches found: <p>Failed to find any books that match your search. Try making the search <b>less specific</b>. For example, use only the author's last name and a single distinctive word from the title.<p>To see the full log, click "Show details". Code:
Running identify query with parameters:
{'title': 'The Great War and Modern Memory', 'authors': ['Paul Fussell et a.l.'], 'identifiers': {'isbn': '9781299600850'}, 'timeout': 30}
Using plugins: Kobo Books (1, 8, 2)
The log from individual plugins is below
****************************** Kobo Books (1, 8, 2) ******************************
Found 0 results
Downloading from Kobo Books took 30.167236328125
identify - title: "The Great War and Modern Memory" authors= "['Paul Fussell et a.l.']"
Querying: https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all
Failed to make identify query: 'https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all'
Traceback (most recent call last):
File "mechanize\_urllib2_fork.py", line 1238, in do_open
File "http\client.py", line 1347, in getresponse
File "http\client.py", line 307, in begin
File "http\client.py", line 268, in _read_status
File "socket.py", line 669, in readinto
File "ssl.py", line 1241, in recv_into
File "ssl.py", line 1099, in read
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "calibre_plugins.kobobooks.__init__", line 167, in identify
File "mechanize\_mechanize.py", line 241, in open_novisit
File "mechanize\_mechanize.py", line 287, in _mech_open
File "mechanize\_opener.py", line 193, in open
File "mechanize\_urllib2_fork.py", line 425, in _open
File "mechanize\_urllib2_fork.py", line 414, in _call_chain
File "E:\Development\GitHub\calibre\src\calibre\utils\browser.py", line 29, in https_open
File "mechanize\_urllib2_fork.py", line 1240, in do_open
urllib.error.URLError: <urlopen error The read operation timed out>
********************************************************************************
The identify phase took 30.36 seconds
The longest time (30.167236) was taken by: Kobo Books
Merging results from different sources
We have 0 merged results, merging took: 0.00 seconds
But, since I have started looking at it, I do not understand why calibre is timing out. There is a redirect happening, but, that has been followed in the past, and appears to be working for any other method I try fetch the page. Such as curl. The problem seems to be in the browser object built for and passed the the plugin. If I replace this with using the browser in mechanize directly, it works. THe code in question in the plugin is is: Code:
kobobooks_id = identifiers.get(self.ID_NAME, None)
br = self.browser
if kobobooks_id:
matches.append(('%s%s%s'%(KoboBooks.BASE_URL, KoboBooks.BOOK_PATH, kobobooks_id), None))
# log("identify - kobobooks_id=", kobobooks_id)
# log("identify - matches[0]=", matches[0])
else:
query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)
if query is None:
log.error('Insufficient metadata to construct query')
return
try:
log.info('Querying: %s'%query)
# br.set_handle_redirect(True)
raw = br.open_novisit(query, timeout=timeout).read()
# raw = br.open(query, timeout=timeout).read()
# open('E:\\t.html', 'wb').write(raw)
except Exception as e:
err = 'Failed to make identify query: %r'%query
log.exception(err)
return as_unicode(e)
If I replace the "br = self.browser" with: Code:
from mechanize import Browser
br = Browser()
I am sure that the initial trigger for this is a change by Kobo or Akamai. But, mechanize by itself can handle it, so I am not sure what the browser that calibre uses is doing. I've been playing with the various options for the browser, but, nothing changes the result. Any suggestions? |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,611
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's bot protection, based on the user agent header. The following will duplicate what mechanize sends and works for me
Code:
calibre-debug -c "from calibre import browser; br = browser(); br.addheaders = [('User-agent', 'Python-urllib/3.9')]; br.open('https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all')"
Last edited by kovidgoyal; 11-01-2021 at 06:50 AM. |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
The one thing I didn't try was a non-browser user agent. It works, I'll use it and see what happens later.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Display vs content of metada | jcdede | Library Management | 2 | 12-23-2014 07:53 AM |
| Update metada of several books at time | kreti | Library Management | 6 | 05-23-2014 02:52 PM |
| mobi metada for Amazon | AlexBell | Kindle Formats | 11 | 08-25-2011 07:20 AM |
| Fetching news. Timeout? | Sciamano | Recipes | 9 | 04-13-2011 07:30 AM |
| Collection metada | johansolo | ePub | 2 | 08-22-2009 09:32 PM |