The easiest way to reproduce this before I write some artificial test case is using the plugin where I saw it more in the wild. The plugin FanFicFare currently includes the open source version of the Python library cloudscraper. You would need to get a copy of FanFicFare now, because for reasons about to become obvious it is likely about to remove cloudscraper. If you install FanFicFare in Calibre, it will add a UI element to the menu toolbar. Use its menu "Download from URLs", giving it the URL of any story you wish that is from fanfiction.net. That will probably fail with an error that mentions seeing a version 2 challenge, which means the attempt was blocked by an advanced level of protection from Cloudflare. That error comes from cloudscraper, is now the expected result, and part of why cloudscraper and access to fanfiction.net will likely be removed from FanFicFare shortly. You can see the error in the job's details or by running calibre-debug for it. If it doesn't fail on that attempt, that is ok too, just that lately the error is the more common result.
The current release version of FanFicFare.zip is always in a link in the first post of the thread
https://www.mobileread.com/forums/sh...d.php?t=259221
(The next part is something I only tried on my machine which runs macOS Catalina. I have not seen if it reproduces on Windows or Linux)
Unzip FanFicFare.zip, create a foo.so file in its root directory. Contents don't matter, even length 0 using touch, or even in a subdirectories you create, none of that matter, make a zip file with it, and reinstall that in Calibre.
This time when you try to download a story from fanfiction.net it should fail with a different error
Code:
No such file or directory: '/var/folders/j_/l6_c445j0v7gy0dw7y35jq240000gn/C/calibre_5.9.0_tmp_dsx7opbm/i0zmj1_1plugin_unzip/cloudscraper/user_agent/browsers.json'
This error comes from
cloudscraper/user_agent/__init__.py where it gets the path for its included file by using
os.path.dirname(__file__)
I tried patching the code to replace
cloudscraper/user_agent/browsers.json with a
browsers.py I wrote that has a
def browsers() that returns a string and calling
json.loads on it instead of
json.load on the file. When I do that, trying to get a story from fanfiction.net gets past there to a new error, which is in the included
requests library when it tries to read a root certificates file trying to access an https URL
Code:
Could not find a suitable TLS CA certificate bundle, invalid path: /var/folders/j_/l6_c445j0v7gy0dw7y35jq240000gn/C/calibre_5.9.0_tmp_727zsud6/w19l8s4xplugin_unzip/certifi/cacert.pem
I added debugging logging to print the value of
__file__ in
cloudscraper/user_agents/__init__.py where it makes use of
os.path.dirname(__file__)
I found that without adding the foo.so it shows FanFicFare.zip as the root of the path, and with foo.so existing it shows that
/var/folders/... path on the disk. Debugging logging also showed that at the time of the error the
/var/folders/... path specified by
__file__ does not exist.
It might be easy to dismiss this as something that
cloudscraper/user_agent should not be doing with
__file__ but the fact that a more common package like
requests fails when accessing https concerns me.