New York Times recipe blocked as Bot

mr316 · 11-11-2025, 09:45 AM

Just started this morning, appears the NYT is classifying Calibre pulling news as a bot -

InputFormatPlugin: Recipe Input running
Downloading recipe urn: builtin:nytimes_sub
Trying to get latest version of recipe: nytimes_sub
Using user agent: User-Agent: Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot)
Recipe specific options:
web = Todays Paper
days = 7
comp = no
Traceback (most recent call last):
File "runpy.py", line 198, in _run_module_as_main
File "runpy.py", line 88, in _run_code
File "site.py", line 83, in <module>
File "site.py", line 78, in main
File "site.py", line 50, in run_entry_point
File "calibre\utils\ipc\worker.py", line 213, in main
File "calibre\gui2\convert\gui_conversion.py", line 32, in gui_convert_recipe
File "calibre\gui2\convert\gui_conversion.py", line 26, in gui_convert
File "calibre\ebooks\conversion\plumber.py", line 1089, in run
File "calibre\customize\conversion.py", line 242, in __call__
File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 153, in convert
File "calibre\web\feeds\news.py", line 1122, in download
File "calibre\web\feeds\news.py", line 1300, in build_index
File "<string>", line 226, in parse_index
File "<string>", line 199, in parse_todays_page
File "calibre\web\feeds\news.py", line 752, in index_to_soup
File "mechanize\_mechanize.py", line 241, in open_novisit
File "mechanize\_mechanize.py", line 313, in _mech_open
mechanize._response.get_seek_wrapper_class.<locals >.httperror_seek_wrapper: HTTP Error 403: Not Allowed, Forbidden, Bot Blocked

kovidgoyal · 11-11-2025, 11:25 AM

Yeah, that started today. I dont see an easy workaround however. They have decided to start blocking "bots". For a while pretending to be the wayback machine got past it, doesnt work anymore.

philoufr · 11-14-2025, 03:13 AM

The New York Times Book Review recipe unfortunately is blocked, too.

Hoping you can find a workaround, as usual.

nickredding · 11-15-2025, 11:18 AM

Interestingly, the New York Times articles are all available via archive.is. I'm pretty sure this is a result of archive.is scraping the nytimes website because it seems unlikely that individual users are archiving articles. Alternatively, perhaps there is a bot that isn't seen as a bot because it has an nytimes subscription and uploads everything daily.

scottsan1 · 11-16-2025, 06:14 AM

also hoping for an eventual solution.

bhartman36 · 11-24-2025, 05:39 PM

Hoping for a workaround here, too. I'll use Instapaper for now, but it would be nice to be able to download the whole paper in one shot.

bllittle · 02-22-2026, 01:34 AM

Any updates or work-arounds for this?

jazzbox · 03-22-2026, 06:56 PM

For the past week, the recipe has been pulling section headers (but only headers) but not failing. Might this be an opening?

kovidgoyal · 03-22-2026, 10:43 PM

You mean its downloading page titles or that its downloading the lsit of articles?

jazzbox · 03-23-2026, 01:33 AM

Page titles: 'The Front Page' 'International' 'National' etc as headers for separate (otherwise blank) pages

kovidgoyal · 03-23-2026, 07:22 AM

Ah no that just means that the mytimes is returning captchas after the initial index download. Look at the download job log and you will see erorr messages about CAPTCHAs

audio_inside · 04-04-2026, 06:47 PM

OK, after the latest changes at NYTimes.com I was also getting only the front page and section indexes with no article content. But last night I managed to create a scheme and a recipe that got the current articles from the NYTimes successfully downloaded (and then transferred to my Kindle Oasis.)

The scheme first requires logging into NYTimes.com with my subscription and then manually extracting the session cookie "NYT-S" and the anti-scraper DataDome cookie "datadome". I leverage the Account pane in the recipe to hold these cookies and inject them in place of my login and password. These cookies should not need refreshing for awhile - "NYT-S" is only recreated on a new browser or after a logout and "datadome" depends on not arousing the suspicions of the Times' "intelligent" DataDome firewall so that it throws up a CAPTCHA.

The NYTimes also does not like Calibre's in-built headless browser, so I had to spin up a FlareSolverr instance in Docker on my server which exposes a Chrome browser to use as a proxy; I point the recipe to that browser on one of my server ports.

Also, to get this to work I had to restrict the recipe to only 1 download at a time to avoid arousing the suspicions of DataDome's anti-scraper algorithms. I also commented-out a number of NYTimes content sections that I'm not interested in and asked only for the articles from the last 24 hours to keep the download time reasonable. Even still, however, it required 2 hours to download the entire edition of the paper and it puts a fair CPU load onto my Celeron-based NAS server.

To reduce the download time I may try altering the way the FlareSolverr Chrome browser is instantiated so that it remains running between fetches instead of being re-started for each download.

Has anybody else tried any of these techniques to get past the current NYTimes.com roadblocks?

audio_inside · 04-05-2026, 04:16 PM

I've made a few changes to my recipe and how it accesses FlareSolverr, and I'm now getting a day and a half's worth of NYTimes articles (145 total) downloaded in an hour and 10 minutes. (I'm on a 300Mb/s connection.)

Big improvement on my original scheme! Just as a comparison point, what rates were people getting from the NYTimes before the last round of anti-scraper algorithms were deployed?

I am going to tweak it a little more but at this point I think my fetches are rate-limited by the Celeron processor in my Synology NAS, which is hitting 75%-85% CPU load during these downloads.

11-11-2025, 09:45 AM	#1
mr316 Junior Member Posts: 4 Karma: 10 Join Date: Nov 2025 Device: Samsung Galaxy Tab S8	New York Times recipe blocked as Bot Just started this morning, appears the NYT is classifying Calibre pulling news as a bot - InputFormatPlugin: Recipe Input running Downloading recipe urn: builtin:nytimes_sub Trying to get latest version of recipe: nytimes_sub Using user agent: User-Agent: Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot) Recipe specific options: web = Todays Paper days = 7 comp = no Traceback (most recent call last): File "runpy.py", line 198, in _run_module_as_main File "runpy.py", line 88, in _run_code File "site.py", line 83, in <module> File "site.py", line 78, in main File "site.py", line 50, in run_entry_point File "calibre\utils\ipc\worker.py", line 213, in main File "calibre\gui2\convert\gui_conversion.py", line 32, in gui_convert_recipe File "calibre\gui2\convert\gui_conversion.py", line 26, in gui_convert File "calibre\ebooks\conversion\plumber.py", line 1089, in run File "calibre\customize\conversion.py", line 242, in __call__ File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 153, in convert File "calibre\web\feeds\news.py", line 1122, in download File "calibre\web\feeds\news.py", line 1300, in build_index File "<string>", line 226, in parse_index File "<string>", line 199, in parse_todays_page File "calibre\web\feeds\news.py", line 752, in index_to_soup File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 313, in _mech_open mechanize._response.get_seek_wrapper_class.<locals >.httperror_seek_wrapper: HTTP Error 403: Not Allowed, Forbidden, Bot Blocked

04-04-2026, 06:47 PM	#12
audio_inside Junior Member Posts: 2 Karma: 10 Join Date: Apr 2026 Device: Kindle Oasis	Bypassing CAPTCHAS OK, after the latest changes at NYTimes.com I was also getting only the front page and section indexes with no article content. But last night I managed to create a scheme and a recipe that got the current articles from the NYTimes successfully downloaded (and then transferred to my Kindle Oasis.) The scheme first requires logging into NYTimes.com with my subscription and then manually extracting the session cookie "NYT-S" and the anti-scraper DataDome cookie "datadome". I leverage the Account pane in the recipe to hold these cookies and inject them in place of my login and password. These cookies should not need refreshing for awhile - "NYT-S" is only recreated on a new browser or after a logout and "datadome" depends on not arousing the suspicions of the Times' "intelligent" DataDome firewall so that it throws up a CAPTCHA. The NYTimes also does not like Calibre's in-built headless browser, so I had to spin up a FlareSolverr instance in Docker on my server which exposes a Chrome browser to use as a proxy; I point the recipe to that browser on one of my server ports. Also, to get this to work I had to restrict the recipe to only 1 download at a time to avoid arousing the suspicions of DataDome's anti-scraper algorithms. I also commented-out a number of NYTimes content sections that I'm not interested in and asked only for the articles from the last 24 hours to keep the download time reasonable. Even still, however, it required 2 hours to download the entire edition of the paper and it puts a fair CPU load onto my Celeron-based NAS server. To reduce the download time I may try altering the way the FlareSolverr Chrome browser is instantiated so that it remains running between fetches instead of being re-started for each download. Has anybody else tried any of these techniques to get past the current NYTimes.com roadblocks?

04-05-2026, 04:16 PM	#13
audio_inside Junior Member Posts: 2 Karma: 10 Join Date: Apr 2026 Device: Kindle Oasis	I've made a few changes to my recipe and how it accesses FlareSolverr, and I'm now getting a day and a half's worth of NYTimes articles (145 total) downloaded in an hour and 10 minutes. (I'm on a 300Mb/s connection.) Big improvement on my original scheme! Just as a comparison point, what rates were people getting from the NYTimes before the last round of anti-scraper algorithms were deployed? I am going to tweak it a little more but at this point I think my fetches are rate-limited by the Celeron processor in my Synology NAS, which is hitting 75%-85% CPU load during these downloads. Last edited by audio_inside; 04-05-2026 at 04:18 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
NYT Spanish New York Times Español El Times Recipe	compa	Recipes	0	03-24-2022 02:40 PM
New York Times Recipe	dieterpops	Recipes	1	01-20-2013 12:26 PM
Which New York Times recipe?	jdomingos76	Recipes	1	03-25-2011 08:40 PM
Help - New York Times Recipe	brutalist	Recipes	6	03-20-2011 10:17 PM
New York Times recipe	madrone26	Calibre	4	04-02-2009 01:13 PM

11-11-2025, 11:25 AM	#2
kovidgoyal creator of calibre Posts: 46,198 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Yeah, that started today. I dont see an easy workaround however. They have decided to start blocking "bots". For a while pretending to be the wayback machine got past it, doesnt work anymore.

11-14-2025, 03:13 AM	#3
philoufr Member Posts: 11 Karma: 10 Join Date: Nov 2011 Device: Kindle Paperwhite	The New York Times Book Review recipe unfortunately is blocked, too. Hoping you can find a workaround, as usual.

11-15-2025, 11:18 AM	#4
nickredding onlinenewsreader.net Posts: 336 Karma: 10143 Join Date: Dec 2009 Location: Kelowna BC Device: Various	Interestingly, the New York Times articles are all available via archive.is. I'm pretty sure this is a result of archive.is scraping the nytimes website because it seems unlikely that individual users are archiving articles. Alternatively, perhaps there is a bot that isn't seen as a bot because it has an nytimes subscription and uploads everything daily.

11-16-2025, 06:14 AM	#5
scottsan1 Junior Member Posts: 8 Karma: 10 Join Date: Feb 2021 Device: iPad mini	also hoping for an eventual solution.

11-24-2025, 05:39 PM	#6
bhartman36 Wizard Posts: 1,325 Karma: 1515835 Join Date: Mar 2009 Location: New Jersey, USA Device: Kobo Libra Colour, Kindle Paperwhite Signature Edition (2021)	Hoping for a workaround here, too. I'll use Instapaper for now, but it would be nice to be able to download the whole paper in one shot.

02-22-2026, 01:34 AM	#7
bllittle Junior Member Posts: 8 Karma: 10 Join Date: Jan 2021 Device: Kindle app on android tablet	Any updates or work-arounds for this?

03-22-2026, 06:56 PM	#8
jazzbox Member Posts: 23 Karma: 190 Join Date: Nov 2017 Device: Kindle paperwhite	For the past week, the recipe has been pulling section headers (but only headers) but not failing. Might this be an opening?

03-22-2026, 10:43 PM	#9
kovidgoyal creator of calibre Posts: 46,198 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You mean its downloading page titles or that its downloading the lsit of articles?

03-23-2026, 01:33 AM	#10
jazzbox Member Posts: 23 Karma: 190 Join Date: Nov 2017 Device: Kindle paperwhite	Page titles: 'The Front Page' 'International' 'National' etc as headers for separate (otherwise blank) pages

03-23-2026, 07:22 AM	#11
kovidgoyal creator of calibre Posts: 46,198 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Ah no that just means that the mytimes is returning captchas after the initial index download. Look at the download job log and you will see erorr messages about CAPTCHAs

Advert

Advert