View Single Post
Old 09-02-2024, 07:09 AM   #12
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,217
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
Quote:
Originally Posted by Bradles View Post
LibraryThing also stared using cloudflare a while ago. For my plugin (LibraryThing Match) I had to change the user agent twice to avoid 403 errors. This is what it is currently:

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
Thanks for tip. I tried it here just to be sure, but didn't work.

A couple of years ago I started using complete headers, as Skoob was blocking requests with default header. Then, I moved to using calibre's randon_ua method, so I wouldn't hit the server always with the same header. This worked for more than four years, but fails now.

When you access the login page, it lands on a cloudflare page, where you need to wait a few seconds before proceeding. No need to click anything. For what I could gather, cloudflare use some javascript code to create a challenge for the 'browser' to solve. If you are using standard python libraries (urllib or requests), you have no javascript capabilities and get blocked. A headless browser could solve it, but it doesn't seem to be so simple. Calibre headless browser, for instance, has no javascript enabled, as Kovid stated.

FlareSolverr and cloudscraper didn't work either. I even tried testing a paid service, like https://scrapeops.io/. It worked to access the login page, but I couldn't make a successful POST request (even with their support's help). They have a js_scenario option, that you could use to manipulate the form and press the button, instead of using a POST request, but that also didn't work.

Anyway, looks like a lost cause.
thiago.eec is offline   Reply With Quote