11-01-2021, 04:02 AM | #1 |
Grand Sorcerer
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Timeout when fetching metada
My Kobo Books metadata source plugin stopped recently. The report for it in the plugin's thread is here. From the error, it is a timeout. The logs I see when I do this is:
calibre, version 5.31.1 ERROR: No matches found: <p>Failed to find any books that match your search. Try making the search <b>less specific</b>. For example, use only the author's last name and a single distinctive word from the title.<p>To see the full log, click "Show details". Code:
Running identify query with parameters: {'title': 'The Great War and Modern Memory', 'authors': ['Paul Fussell et a.l.'], 'identifiers': {'isbn': '9781299600850'}, 'timeout': 30} Using plugins: Kobo Books (1, 8, 2) The log from individual plugins is below ****************************** Kobo Books (1, 8, 2) ****************************** Found 0 results Downloading from Kobo Books took 30.167236328125 identify - title: "The Great War and Modern Memory" authors= "['Paul Fussell et a.l.']" Querying: https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all Failed to make identify query: 'https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all' Traceback (most recent call last): File "mechanize\_urllib2_fork.py", line 1238, in do_open File "http\client.py", line 1347, in getresponse File "http\client.py", line 307, in begin File "http\client.py", line 268, in _read_status File "socket.py", line 669, in readinto File "ssl.py", line 1241, in recv_into File "ssl.py", line 1099, in read socket.timeout: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "calibre_plugins.kobobooks.__init__", line 167, in identify File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 287, in _mech_open File "mechanize\_opener.py", line 193, in open File "mechanize\_urllib2_fork.py", line 425, in _open File "mechanize\_urllib2_fork.py", line 414, in _call_chain File "E:\Development\GitHub\calibre\src\calibre\utils\browser.py", line 29, in https_open File "mechanize\_urllib2_fork.py", line 1240, in do_open urllib.error.URLError: <urlopen error The read operation timed out> ******************************************************************************** The identify phase took 30.36 seconds The longest time (30.167236) was taken by: Kobo Books Merging results from different sources We have 0 merged results, merging took: 0.00 seconds But, since I have started looking at it, I do not understand why calibre is timing out. There is a redirect happening, but, that has been followed in the past, and appears to be working for any other method I try fetch the page. Such as curl. The problem seems to be in the browser object built for and passed the the plugin. If I replace this with using the browser in mechanize directly, it works. THe code in question in the plugin is is: Code:
kobobooks_id = identifiers.get(self.ID_NAME, None) br = self.browser if kobobooks_id: matches.append(('%s%s%s'%(KoboBooks.BASE_URL, KoboBooks.BOOK_PATH, kobobooks_id), None)) # log("identify - kobobooks_id=", kobobooks_id) # log("identify - matches[0]=", matches[0]) else: query = self.create_query(log, title=title, authors=authors, identifiers=identifiers) if query is None: log.error('Insufficient metadata to construct query') return try: log.info('Querying: %s'%query) # br.set_handle_redirect(True) raw = br.open_novisit(query, timeout=timeout).read() # raw = br.open(query, timeout=timeout).read() # open('E:\\t.html', 'wb').write(raw) except Exception as e: err = 'Failed to make identify query: %r'%query log.exception(err) return as_unicode(e) If I replace the "br = self.browser" with: Code:
from mechanize import Browser br = Browser() I am sure that the initial trigger for this is a change by Kobo or Akamai. But, mechanize by itself can handle it, so I am not sure what the browser that calibre uses is doing. I've been playing with the various options for the browser, but, nothing changes the result. Any suggestions? |
11-01-2021, 05:46 AM | #2 |
creator of calibre
Posts: 44,012
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's bot protection, based on the user agent header. The following will duplicate what mechanize sends and works for me
Code:
calibre-debug -c "from calibre import browser; br = browser(); br.addheaders = [('User-agent', 'Python-urllib/3.9')]; br.open('https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all')" Last edited by kovidgoyal; 11-01-2021 at 05:50 AM. |
Advert | |
|
11-01-2021, 07:21 AM | #3 |
Grand Sorcerer
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
The one thing I didn't try was a non-browser user agent. It works, I'll use it and see what happens later.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Display vs content of metada | jcdede | Library Management | 2 | 12-23-2014 06:53 AM |
Update metada of several books at time | kreti | Library Management | 6 | 05-23-2014 01:52 PM |
mobi metada for Amazon | AlexBell | Kindle Formats | 11 | 08-25-2011 06:20 AM |
Fetching news. Timeout? | Sciamano | Recipes | 9 | 04-13-2011 06:30 AM |
Collection metada | johansolo | ePub | 2 | 08-22-2009 08:32 PM |