MobileRead Forums - View Single Post

kiwidude · 04-04-2023, 07:00 PM

It was brought to my attention that the plugin is still intermittently failing, and I have verified this for myself. The problem as it turns out is not necessarily a user agent based thing. It just seems that some requests come back with some fairly minimal html intermittently. I had been thinking I could fallback if the json wasn't fully populated to just scraping the page, but in these circumstances the rest of the page html isn't containing the content either.

From a relatively quick look to figure all this out I frankly am unsure how I can proceed at this point. Goodreads have really messed with their servers with these updates they rolled out over the last year - perhaps this is some intentional anti-scraping feature, or perhaps it is just different builds of their site on different servers.

The crudest option to work around it may have to be just constantly retrying until the plugin gets a valid html response back. This could be pretty slow of course and no guarantees that you don't get 50 failed attempts in a row before one that succeeds.

The joys of web scraping unfortunately.

04-04-2023, 07:00 PM	#684
kiwidude Calibre Plugins Developer Posts: 4,737 Karma: 2208556 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	It was brought to my attention that the plugin is still intermittently failing, and I have verified this for myself. The problem as it turns out is not necessarily a user agent based thing. It just seems that some requests come back with some fairly minimal html intermittently. I had been thinking I could fallback if the json wasn't fully populated to just scraping the page, but in these circumstances the rest of the page html isn't containing the content either. From a relatively quick look to figure all this out I frankly am unsure how I can proceed at this point. Goodreads have really messed with their servers with these updates they rolled out over the last year - perhaps this is some intentional anti-scraping feature, or perhaps it is just different builds of their site on different servers. The crudest option to work around it may have to be just constantly retrying until the plugin gets a valid html response back. This could be pretty slow of course and no guarantees that you don't get 50 failed attempts in a row before one that succeeds. The joys of web scraping unfortunately. Last edited by kiwidude; 04-04-2023 at 07:15 PM.