![]() |
#1 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Problems with random user agents and Goodreads
The Goodreads metadata plugin follows the standard pattern of invoking some generic calibre code for getting a browser object through self.browser and then browser.clone_browser() within the worker.py. From what I remember (code not in front of me) somewhere in the calibre code is some cleverness to generate random user agents presumably to assist with not having us blocked for scraping.
The problem I am seeing with Goodreads specifically is that when moving to a new website design they provided different html to different browsers. So for Chrome you get the full blown new design, then for other browsers like Firefox you get something different. The latest iteration now means that there is yet another variant between at least those two browser types. Frankly it is all a giant pain to deal with, and I would rather just have Chrome based user agents rather than all these other variants which are creating too much work for me to have to maintain support for. Is there a suggested/recommended way to provide some Chrome only based user agents from the browser provided to the plugin? I vaguely recall years ago there was a method you could override in a base plugin class to provide the agents? I don't yet have my own computer with me to be able to trawl through the code easily and figure out whether it was my imagination or if true exactly what that code should look like to do what I want? Or is this all just a bad idea, likely to see the Goodreads plugin completely blocked and I should suck it up and try to support scraping for all the different page variants... |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,171
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You override
Code:
@property def user_agent(self): return random_user_agent() |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Thank you Kovid for the steer, I shall take a look at this in a few weeks time.
|
![]() |
![]() |
![]() |
#4 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hi Kovid,
So a user kindly put some code in for me making use of a random_chome_user_agent() function from random_ua.py. However someone else then tells me that this was only added in calibre 6. Looking at the history I can see there was previously a random_chrome_ua() function from back in 2017 - can you please give me a rough min version of calibre I could use for calling that as a fallback? Trying not to force calibre 6 as a min version for widest possible support. |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,171
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It was added in caac92bbd8 so any release after that should be fine. You can find a list of releases for any commit in github,
https://github.com/kovidgoyal/calibr...7162ad729850a4 which here gives us 2.81.0 Though Ihave no idea how well that function works in historical calibre releases. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
That's great thanks Kovid. Yeah it will be an "at your own risk" type of support in clinging on to an older calibre, if it works then it is a bonus, if not they upgrade.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Amazon's Goodreads ditching a user feature | Quoth | News | 27 | 08-18-2022 10:47 AM |
Problems downloading metadata from Goodreads? | Japes | Plugins | 2 | 10-19-2012 02:15 PM |
Problems with Goodreads? | RolandD | General Discussions | 21 | 10-05-2012 03:39 PM |
Color Random problems | Elizziewag | Nook Color & Nook Tablet | 0 | 12-19-2011 09:32 AM |
Why Don't Agents Want to Play? Amazon Flies Agents to Seattle to Find Out | taglines | News | 23 | 11-18-2009 12:30 AM |