WSJ recipe fails to find print version - Page 2

unkn0wn · 10-24-2025, 05:47 AM

the WSJ Magaznie and WSJ News will no longer work. We have been extremely lucky for sometime as I found a work around for WSJ with graphql.

the CAPTCHA page is being faced by archive.is when it fetches content from WSJ, we cant do anything anything to fix it. maybe wait for archive to update.

nickredding · 11-03-2025, 04:24 AM

It appears that archive.is is screening traffic from wifi networks. Accessing archive.is via mobile networks (probably determined from carrier IP address) doesn’t attract the screening. I’m assuming this is an anti-scraping strategy, and it’s not confined to WSJ.

unkn0wn · 11-03-2025, 10:29 AM

someone who is facing this issue must try adding delay to the recipe and tell us if it works.

I don't use this recipe much and i could not replicate this issue for testing.

nickredding · 11-05-2025, 10:48 AM

Delay doesn’t affect this, I get screened on the first attempt to load an article via wifi but if I switch to mobile data and try again the article loads. Note that I’m not using Calibre for this.

PowerfulGarbage · 12-24-2025, 04:45 PM

I’m also getting this error. Mobile data didn’t change it.

nickredding · 12-28-2025, 03:11 PM

archive.is uses a combination of web browser detection and geolocation to determine if a captcha challenge should be presented.

Any access from a web browser is challenged.

Any access from a USA IP address is challenged.

From an IP address outside of the USA, mobile apps using iOS or Android http access are not challenged. However, VPNs don’t help, it seems archive.is detects them and issues a challenge.

If the difference between web browser access and iOS/Android app access could be determined, it might be possible to modify the Python mechanize apparatus to mimic the native apps and get around the captcha challenge. However, it would only work for users outside of the USA.

So, unless someone can figure out how to successfully respond to a captcha challenge, it looks like the end of the line for recipes that depend on archive.is

nickredding · 02-09-2026, 05:20 PM

It looks like Cloudflare is being used widely as an anti-scraping and bot blocking service.

Cloudflare has developed a mechanism called "Private Access Tokens" which is subscribed to by iOS and Android to provide validation that a network request is originating from an actual user device. This mechanism is invoked both by web browsers and native apps using iOS or Android network requests.

Private Access Tokens are intended to reduce (or even eliminate) the need for captcha challenges to block scrapers and bots, and it seems to be very successful.

It looks like archive.is is using Cloudflare and its own mechanisms (see my previous message) to repel scrapers and bots.

Interestingly, archive.is issues captcha challenges for access from the iOS Safari browser but not for native apps using iOS URLSession.

All of this doesn't suggest a way to get around Cloudflare--calibre is a web scraper and Cloudflare is doing what it is designed to do by blocking it. But it does shed some light on why native apps that access network resources on demand (as opposed to batch scraping them) continue to work.

11-03-2025, 04:24 AM	#17
nickredding onlinenewsreader.net Posts: 336 Karma: 10143 Join Date: Dec 2009 Location: Kelowna BC Device: Various	archive.is screening It appears that archive.is is screening traffic from wifi networks. Accessing archive.is via mobile networks (probably determined from carrier IP address) doesn’t attract the screening. I’m assuming this is an anti-scraping strategy, and it’s not confined to WSJ.

12-28-2025, 03:11 PM	#21
nickredding onlinenewsreader.net Posts: 336 Karma: 10143 Join Date: Dec 2009 Location: Kelowna BC Device: Various	archive.is behaviour archive.is uses a combination of web browser detection and geolocation to determine if a captcha challenge should be presented. Any access from a web browser is challenged. Any access from a USA IP address is challenged. From an IP address outside of the USA, mobile apps using iOS or Android http access are not challenged. However, VPNs don’t help, it seems archive.is detects them and issues a challenge. If the difference between web browser access and iOS/Android app access could be determined, it might be possible to modify the Python mechanize apparatus to mimic the native apps and get around the captcha challenge. However, it would only work for users outside of the USA. So, unless someone can figure out how to successfully respond to a captcha challenge, it looks like the end of the line for recipes that depend on archive.is

02-09-2026, 05:20 PM	#22
nickredding onlinenewsreader.net Posts: 336 Karma: 10143 Join Date: Dec 2009 Location: Kelowna BC Device: Various	More on url blocking It looks like Cloudflare is being used widely as an anti-scraping and bot blocking service. Cloudflare has developed a mechanism called "Private Access Tokens" which is subscribed to by iOS and Android to provide validation that a network request is originating from an actual user device. This mechanism is invoked both by web browsers and native apps using iOS or Android network requests. Private Access Tokens are intended to reduce (or even eliminate) the need for captcha challenges to block scrapers and bots, and it seems to be very successful. It looks like archive.is is using Cloudflare and its own mechanisms (see my previous message) to repel scrapers and bots. Interestingly, archive.is issues captcha challenges for access from the iOS Safari browser but not for native apps using iOS URLSession. All of this doesn't suggest a way to get around Cloudflare--calibre is a web scraper and Cloudflare is doing what it is designed to do by blocking it. But it does shed some light on why native apps that access network resources on demand (as opposed to batch scraping them) continue to work.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Fetch WSJ (free) recipe fails	dagon	Recipes	2	03-28-2025 11:04 AM
WSJ recipe fails	mjfriedman	Recipes	13	10-17-2019 02:09 PM
WSJ recipe fails	ebonytowers	Recipes	25	09-13-2019 06:28 AM
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar	winterescape	Recipes	16	02-07-2011 01:51 PM
Proper code for fetching Print Version from WSJ and NYT?	brad382	Calibre	1	12-20-2008 01:06 PM

10-24-2025, 05:47 AM	#16
unkn0wn Guru Posts: 649 Karma: 85520 Join Date: May 2021 Device: kindle	the WSJ Magaznie and WSJ News will no longer work. We have been extremely lucky for sometime as I found a work around for WSJ with graphql. the CAPTCHA page is being faced by archive.is when it fetches content from WSJ, we cant do anything anything to fix it. maybe wait for archive to update.

11-03-2025, 10:29 AM	#18
unkn0wn Guru Posts: 649 Karma: 85520 Join Date: May 2021 Device: kindle	someone who is facing this issue must try adding delay to the recipe and tell us if it works. I don't use this recipe much and i could not replicate this issue for testing.

11-05-2025, 10:48 AM	#19
nickredding onlinenewsreader.net Posts: 336 Karma: 10143 Join Date: Dec 2009 Location: Kelowna BC Device: Various	Delay doesn’t affect this, I get screened on the first attempt to load an article via wifi but if I switch to mobile data and try again the article loads. Note that I’m not using Calibre for this.

12-24-2025, 04:45 PM	#20
PowerfulGarbage Junior Member Posts: 1 Karma: 10 Join Date: Nov 2025 Device: Kindle	I’m also getting this error. Mobile data didn’t change it.

Advert

Advert