Timeout when fetching metada

davidfor · 11-01-2021, 05:02 AM

My Kobo Books metadata source plugin stopped recently. The report for it in the plugin's thread is here. From the error, it is a timeout. The logs I see when I do this is:
calibre, version 5.31.1
ERROR: No matches found: <p>Failed to find any books that match your search. Try making the search <b>less specific</b>. For example, use only the author's last name and a single distinctive word from the title.<p>To see the full log, click "Show details".

Code:

Running identify query with parameters: 
{'title': 'The Great War and Modern Memory', 'authors': ['Paul Fussell et a.l.'], 'identifiers': {'isbn': '9781299600850'}, 'timeout': 30} 
Using plugins: Kobo Books (1, 8, 2) 
The log from individual plugins is below 

****************************** Kobo Books (1, 8, 2) ****************************** 
Found 0 results 
Downloading from Kobo Books took 30.167236328125 
identify - title: "The Great War and Modern Memory" authors= "['Paul Fussell et a.l.']"
Querying: https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all
Failed to make identify query: 'https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all'
Traceback (most recent call last):
  File "mechanize\_urllib2_fork.py", line 1238, in do_open
  File "http\client.py", line 1347, in getresponse
  File "http\client.py", line 307, in begin
  File "http\client.py", line 268, in _read_status
  File "socket.py", line 669, in readinto
  File "ssl.py", line 1241, in recv_into
  File "ssl.py", line 1099, in read
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "calibre_plugins.kobobooks.__init__", line 167, in identify
  File "mechanize\_mechanize.py", line 241, in open_novisit
  File "mechanize\_mechanize.py", line 287, in _mech_open
  File "mechanize\_opener.py", line 193, in open
  File "mechanize\_urllib2_fork.py", line 425, in _open
  File "mechanize\_urllib2_fork.py", line 414, in _call_chain
  File "E:\Development\GitHub\calibre\src\calibre\utils\browser.py", line 29, in https_open
  File "mechanize\_urllib2_fork.py", line 1240, in do_open
urllib.error.URLError: <urlopen error The read operation timed out> 

******************************************************************************** 
The identify phase took 30.36 seconds 
The longest time (30.167236) was taken by: Kobo Books 
Merging results from different sources 
We have 0 merged results, merging took: 0.00 seconds

When I started investigating, it appears Kobo has moved the site to use Akamai hosting or caching. I don't think that is new, but, they might have changed some details or exactly how much is hosted be Akamai.

But, since I have started looking at it, I do not understand why calibre is timing out. There is a redirect happening, but, that has been followed in the past, and appears to be working for any other method I try fetch the page. Such as curl.

The problem seems to be in the browser object built for and passed the the plugin. If I replace this with using the browser in mechanize directly, it works.

THe code in question in the plugin is is:

Code:

        kobobooks_id = identifiers.get(self.ID_NAME, None)
        br = self.browser
        if kobobooks_id:
            matches.append(('%s%s%s'%(KoboBooks.BASE_URL, KoboBooks.BOOK_PATH, kobobooks_id), None))
#            log("identify - kobobooks_id=", kobobooks_id)
#            log("identify - matches[0]=", matches[0])
        else:
            query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)
            if query is None:
                log.error('Insufficient metadata to construct query')
                return
            try:
                log.info('Querying: %s'%query)
#                 br.set_handle_redirect(True)
                raw = br.open_novisit(query, timeout=timeout).read()
#                 raw = br.open(query, timeout=timeout).read()
#                 open('E:\\t.html', 'wb').write(raw)
            except Exception as e:
                err = 'Failed to make identify query: %r'%query
                log.exception(err)
                return as_unicode(e)

Using that will produce the timeout as mentioned in the plugins thread.

If I replace the "br = self.browser" with:

Code:

        from mechanize import Browser
        br = Browser()

And the line in the worker.py that clones the browser, it works.

I am sure that the initial trigger for this is a change by Kobo or Akamai. But, mechanize by itself can handle it, so I am not sure what the browser that calibre uses is doing. I've been playing with the various options for the browser, but, nothing changes the result. Any suggestions?

kovidgoyal · 11-01-2021, 06:46 AM

It's bot protection, based on the user agent header. The following will duplicate what mechanize sends and works for me

Code:

calibre-debug -c "from calibre import browser; br = browser(); br.addheaders = [('User-agent', 'Python-urllib/3.9')]; br.open('https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all')"

davidfor · 11-01-2021, 08:21 AM

The one thing I didn't try was a non-browser user agent. It works, I'll use it and see what happens later.

11-01-2021, 06:46 AM	#2
kovidgoyal creator of calibre Posts: 45,722 Karma: 28549306 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It's bot protection, based on the user agent header. The following will duplicate what mechanize sends and works for me Code: calibre-debug -c "from calibre import browser; br = browser(); br.addheaders = [('User-agent', 'Python-urllib/3.9')]; br.open('https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all')" Last edited by kovidgoyal; 11-01-2021 at 06:50 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Display vs content of metada	jcdede	Library Management	2	12-23-2014 07:53 AM
Update metada of several books at time	kreti	Library Management	6	05-23-2014 02:52 PM
mobi metada for Amazon	AlexBell	Kindle Formats	11	08-25-2011 07:20 AM
Fetching news. Timeout?	Sciamano	Recipes	9	04-13-2011 07:30 AM
Collection metada	johansolo	ePub	2	08-22-2009 09:32 PM

11-01-2021, 08:21 AM	#3
davidfor Grand Sorcerer Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos	The one thing I didn't try was a non-browser user agent. It works, I'll use it and see what happens later.

Advert