View Single Post
Old 03-20-2023, 09:54 PM   #1
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,732
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Problems with random user agents and Goodreads

The Goodreads metadata plugin follows the standard pattern of invoking some generic calibre code for getting a browser object through self.browser and then browser.clone_browser() within the worker.py. From what I remember (code not in front of me) somewhere in the calibre code is some cleverness to generate random user agents presumably to assist with not having us blocked for scraping.

The problem I am seeing with Goodreads specifically is that when moving to a new website design they provided different html to different browsers. So for Chrome you get the full blown new design, then for other browsers like Firefox you get something different.

The latest iteration now means that there is yet another variant between at least those two browser types. Frankly it is all a giant pain to deal with, and I would rather just have Chrome based user agents rather than all these other variants which are creating too much work for me to have to maintain support for.

Is there a suggested/recommended way to provide some Chrome only based user agents from the browser provided to the plugin? I vaguely recall years ago there was a method you could override in a base plugin class to provide the agents? I don't yet have my own computer with me to be able to trawl through the code easily and figure out whether it was my imagination or if true exactly what that code should look like to do what I want?

Or is this all just a bad idea, likely to see the Goodreads plugin completely blocked and I should suck it up and try to support scraping for all the different page variants...
kiwidude is offline   Reply With Quote