View Single Post
Old 08-27-2024, 07:38 AM   #4
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,221
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
Quote:
Originally Posted by JimmXinu View Post
I'll offer a few pointers, but I haven't researched it recently.

Cloudfare has more than one level of blocking it can be running at for a given site.

Some lower levels can be bypassed with cloudscraper.

FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.

FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.

The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
Thank you for your help. I did try clouscraper and FlareSolverr, but could not make it work.

Quote:
Originally Posted by kovidgoyal View Post
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py
I tried using WebEngineBroswer to access the login page (protected by Cloudfare) but seems like JavaScript is not enable by default. The HTML returned has this message: 'Enable JavaScript and cookies to continue'.
How can I enable full JavaScript and Cookies support for a WebEngineBrowser instance?

This is necessary because the Cloudfare check basically gives the browser a few JS calculations to solve, before allowing the user to see the login page.
thiago.eec is offline   Reply With Quote