View Single Post
Old 01-10-2021, 02:11 PM   #5
bugstomper
Member
bugstomper began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Jun 2017
Device: kindle
The easiest way to reproduce this before I write some artificial test case is using the plugin where I saw it more in the wild. The plugin FanFicFare currently includes the open source version of the Python library cloudscraper. You would need to get a copy of FanFicFare now, because for reasons about to become obvious it is likely about to remove cloudscraper. If you install FanFicFare in Calibre, it will add a UI element to the menu toolbar. Use its menu "Download from URLs", giving it the URL of any story you wish that is from fanfiction.net. That will probably fail with an error that mentions seeing a version 2 challenge, which means the attempt was blocked by an advanced level of protection from Cloudflare. That error comes from cloudscraper, is now the expected result, and part of why cloudscraper and access to fanfiction.net will likely be removed from FanFicFare shortly. You can see the error in the job's details or by running calibre-debug for it. If it doesn't fail on that attempt, that is ok too, just that lately the error is the more common result.

The current release version of FanFicFare.zip is always in a link in the first post of the thread https://www.mobileread.com/forums/sh...d.php?t=259221

(The next part is something I only tried on my machine which runs macOS Catalina. I have not seen if it reproduces on Windows or Linux)

Unzip FanFicFare.zip, create a foo.so file in its root directory. Contents don't matter, even length 0 using touch, or even in a subdirectories you create, none of that matter, make a zip file with it, and reinstall that in Calibre.

This time when you try to download a story from fanfiction.net it should fail with a different error

Code:
No such file or directory: '/var/folders/j_/l6_c445j0v7gy0dw7y35jq240000gn/C/calibre_5.9.0_tmp_dsx7opbm/i0zmj1_1plugin_unzip/cloudscraper/user_agent/browsers.json'
This error comes from cloudscraper/user_agent/__init__.py where it gets the path for its included file by using os.path.dirname(__file__)

I tried patching the code to replace cloudscraper/user_agent/browsers.json with a browsers.py I wrote that has a def browsers() that returns a string and calling json.loads on it instead of json.load on the file. When I do that, trying to get a story from fanfiction.net gets past there to a new error, which is in the included requests library when it tries to read a root certificates file trying to access an https URL

Code:
Could not find a suitable TLS CA certificate bundle, invalid path: /var/folders/j_/l6_c445j0v7gy0dw7y35jq240000gn/C/calibre_5.9.0_tmp_727zsud6/w19l8s4xplugin_unzip/certifi/cacert.pem
I added debugging logging to print the value of __file__ in cloudscraper/user_agents/__init__.py where it makes use of os.path.dirname(__file__)

I found that without adding the foo.so it shows FanFicFare.zip as the root of the path, and with foo.so existing it shows that /var/folders/... path on the disk. Debugging logging also showed that at the time of the error the /var/folders/... path specified by __file__ does not exist.

It might be easy to dismiss this as something that cloudscraper/user_agent should not be doing with __file__ but the fact that a more common package like requests fails when accessing https concerns me.
bugstomper is offline   Reply With Quote