My Kobo Books metadata source plugin stopped recently. The report for it in the plugin's thread is
here. From the error, it is a timeout. The logs I see when I do this is:
calibre, version 5.31.1
ERROR: No matches found: <p>Failed to find any books that match your search. Try making the search <b>less specific</b>. For example, use only the author's last name and a single distinctive word from the title.<p>To see the full log, click "Show details".
Code:
Running identify query with parameters:
{'title': 'The Great War and Modern Memory', 'authors': ['Paul Fussell et a.l.'], 'identifiers': {'isbn': '9781299600850'}, 'timeout': 30}
Using plugins: Kobo Books (1, 8, 2)
The log from individual plugins is below
****************************** Kobo Books (1, 8, 2) ******************************
Found 0 results
Downloading from Kobo Books took 30.167236328125
identify - title: "The Great War and Modern Memory" authors= "['Paul Fussell et a.l.']"
Querying: https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all
Failed to make identify query: 'https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all'
Traceback (most recent call last):
File "mechanize\_urllib2_fork.py", line 1238, in do_open
File "http\client.py", line 1347, in getresponse
File "http\client.py", line 307, in begin
File "http\client.py", line 268, in _read_status
File "socket.py", line 669, in readinto
File "ssl.py", line 1241, in recv_into
File "ssl.py", line 1099, in read
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "calibre_plugins.kobobooks.__init__", line 167, in identify
File "mechanize\_mechanize.py", line 241, in open_novisit
File "mechanize\_mechanize.py", line 287, in _mech_open
File "mechanize\_opener.py", line 193, in open
File "mechanize\_urllib2_fork.py", line 425, in _open
File "mechanize\_urllib2_fork.py", line 414, in _call_chain
File "E:\Development\GitHub\calibre\src\calibre\utils\browser.py", line 29, in https_open
File "mechanize\_urllib2_fork.py", line 1240, in do_open
urllib.error.URLError: <urlopen error The read operation timed out>
********************************************************************************
The identify phase took 30.36 seconds
The longest time (30.167236) was taken by: Kobo Books
Merging results from different sources
We have 0 merged results, merging took: 0.00 seconds
When I started investigating, it appears Kobo has moved the site to use Akamai hosting or caching. I don't think that is new, but, they might have changed some details or exactly how much is hosted be Akamai.
But, since I have started looking at it, I do not understand why calibre is timing out. There is a redirect happening, but, that has been followed in the past, and appears to be working for any other method I try fetch the page. Such as curl.
The problem seems to be in the browser object built for and passed the the plugin. If I replace this with using the browser in mechanize directly, it works.
THe code in question in the plugin is is:
Code:
kobobooks_id = identifiers.get(self.ID_NAME, None)
br = self.browser
if kobobooks_id:
matches.append(('%s%s%s'%(KoboBooks.BASE_URL, KoboBooks.BOOK_PATH, kobobooks_id), None))
# log("identify - kobobooks_id=", kobobooks_id)
# log("identify - matches[0]=", matches[0])
else:
query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)
if query is None:
log.error('Insufficient metadata to construct query')
return
try:
log.info('Querying: %s'%query)
# br.set_handle_redirect(True)
raw = br.open_novisit(query, timeout=timeout).read()
# raw = br.open(query, timeout=timeout).read()
# open('E:\\t.html', 'wb').write(raw)
except Exception as e:
err = 'Failed to make identify query: %r'%query
log.exception(err)
return as_unicode(e)
Using that will produce the timeout as mentioned in the plugins thread.
If I replace the "br = self.browser" with:
Code:
from mechanize import Browser
br = Browser()
And the line in the worker.py that clones the browser, it works.
I am sure that the initial trigger for this is a change by Kobo or Akamai. But, mechanize by itself can handle it, so I am not sure what the browser that calibre uses is doing. I've been playing with the various options for the browser, but, nothing changes the result. Any suggestions?