Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2021, 04:02 AM   #1
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Timeout when fetching metada

My Kobo Books metadata source plugin stopped recently. The report for it in the plugin's thread is here. From the error, it is a timeout. The logs I see when I do this is:
calibre, version 5.31.1
ERROR: No matches found: <p>Failed to find any books that match your search. Try making the search <b>less specific</b>. For example, use only the author's last name and a single distinctive word from the title.<p>To see the full log, click "Show details".

Code:
Running identify query with parameters: 
{'title': 'The Great War and Modern Memory', 'authors': ['Paul Fussell et a.l.'], 'identifiers': {'isbn': '9781299600850'}, 'timeout': 30} 
Using plugins: Kobo Books (1, 8, 2) 
The log from individual plugins is below 

****************************** Kobo Books (1, 8, 2) ****************************** 
Found 0 results 
Downloading from Kobo Books took 30.167236328125 
identify - title: "The Great War and Modern Memory" authors= "['Paul Fussell et a.l.']"
Querying: https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all
Failed to make identify query: 'https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all'
Traceback (most recent call last):
  File "mechanize\_urllib2_fork.py", line 1238, in do_open
  File "http\client.py", line 1347, in getresponse
  File "http\client.py", line 307, in begin
  File "http\client.py", line 268, in _read_status
  File "socket.py", line 669, in readinto
  File "ssl.py", line 1241, in recv_into
  File "ssl.py", line 1099, in read
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "calibre_plugins.kobobooks.__init__", line 167, in identify
  File "mechanize\_mechanize.py", line 241, in open_novisit
  File "mechanize\_mechanize.py", line 287, in _mech_open
  File "mechanize\_opener.py", line 193, in open
  File "mechanize\_urllib2_fork.py", line 425, in _open
  File "mechanize\_urllib2_fork.py", line 414, in _call_chain
  File "E:\Development\GitHub\calibre\src\calibre\utils\browser.py", line 29, in https_open
  File "mechanize\_urllib2_fork.py", line 1240, in do_open
urllib.error.URLError: <urlopen error The read operation timed out> 

******************************************************************************** 
The identify phase took 30.36 seconds 
The longest time (30.167236) was taken by: Kobo Books 
Merging results from different sources 
We have 0 merged results, merging took: 0.00 seconds
When I started investigating, it appears Kobo has moved the site to use Akamai hosting or caching. I don't think that is new, but, they might have changed some details or exactly how much is hosted be Akamai.

But, since I have started looking at it, I do not understand why calibre is timing out. There is a redirect happening, but, that has been followed in the past, and appears to be working for any other method I try fetch the page. Such as curl.

The problem seems to be in the browser object built for and passed the the plugin. If I replace this with using the browser in mechanize directly, it works.

THe code in question in the plugin is is:

Code:
        kobobooks_id = identifiers.get(self.ID_NAME, None)
        br = self.browser
        if kobobooks_id:
            matches.append(('%s%s%s'%(KoboBooks.BASE_URL, KoboBooks.BOOK_PATH, kobobooks_id), None))
#            log("identify - kobobooks_id=", kobobooks_id)
#            log("identify - matches[0]=", matches[0])
        else:
            query = self.create_query(log, title=title, authors=authors, identifiers=identifiers)
            if query is None:
                log.error('Insufficient metadata to construct query')
                return
            try:
                log.info('Querying: %s'%query)
#                 br.set_handle_redirect(True)
                raw = br.open_novisit(query, timeout=timeout).read()
#                 raw = br.open(query, timeout=timeout).read()
#                 open('E:\\t.html', 'wb').write(raw)
            except Exception as e:
                err = 'Failed to make identify query: %r'%query
                log.exception(err)
                return as_unicode(e)
Using that will produce the timeout as mentioned in the plugins thread.

If I replace the "br = self.browser" with:

Code:
        from mechanize import Browser
        br = Browser()
And the line in the worker.py that clones the browser, it works.

I am sure that the initial trigger for this is a change by Kobo or Akamai. But, mechanize by itself can handle it, so I am not sure what the browser that calibre uses is doing. I've been playing with the various options for the browser, but, nothing changes the result. Any suggestions?
davidfor is offline   Reply With Quote
Old 11-01-2021, 05:46 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,012
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's bot protection, based on the user agent header. The following will duplicate what mechanize sends and works for me

Code:
calibre-debug -c "from calibre import browser; br = browser(); br.addheaders = [('User-agent', 'Python-urllib/3.9')]; br.open('https://www.kobo.com/search?Query=9781299600850&fcmedia=Book&fclanguages=all')"

Last edited by kovidgoyal; 11-01-2021 at 05:50 AM.
kovidgoyal is online now   Reply With Quote
Advert
Old 11-01-2021, 07:21 AM   #3
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
The one thing I didn't try was a non-browser user agent. It works, I'll use it and see what happens later.
davidfor is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Display vs content of metada jcdede Library Management 2 12-23-2014 06:53 AM
Update metada of several books at time kreti Library Management 6 05-23-2014 01:52 PM
mobi metada for Amazon AlexBell Kindle Formats 11 08-25-2011 06:20 AM
Fetching news. Timeout? Sciamano Recipes 9 04-13-2011 06:30 AM
Collection metada johansolo ePub 2 08-22-2009 08:32 PM


All times are GMT -4. The time now is 10:04 AM.


MobileRead.com is a privately owned, operated and funded community.