Quote:
Originally Posted by JimmXinu
I'll offer a few pointers, but I haven't researched it recently.
Cloudfare has more than one level of blocking it can be running at for a given site.
Some lower levels can be bypassed with cloudscraper.
FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.
FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.
The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
|
Thank you for your help. I did try clouscraper and FlareSolverr, but could not make it work.
Quote:
Originally Posted by kovidgoyal
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py
|
I tried using WebEngineBroswer to access the login page (protected by Cloudfare) but seems like JavaScript is not enable by default. The HTML returned has this message: 'Enable JavaScript and cookies to continue'.
How can I enable full JavaScript and Cookies support for a WebEngineBrowser instance?
This is necessary because the Cloudfare check basically gives the browser a few JS calculations to solve, before allowing the user to see the login page.