Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-11-2025, 09:45 AM   #1
mr316
Junior Member
mr316 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2025
Device: Samsung Galaxy Tab S8
New York Times recipe blocked as Bot

Just started this morning, appears the NYT is classifying Calibre pulling news as a bot -

InputFormatPlugin: Recipe Input running
Downloading recipe urn: builtin:nytimes_sub
Trying to get latest version of recipe: nytimes_sub
Using user agent: User-Agent: Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot)
Recipe specific options:
web = Todays Paper
days = 7
comp = no
Traceback (most recent call last):
File "runpy.py", line 198, in _run_module_as_main
File "runpy.py", line 88, in _run_code
File "site.py", line 83, in <module>
File "site.py", line 78, in main
File "site.py", line 50, in run_entry_point
File "calibre\utils\ipc\worker.py", line 213, in main
File "calibre\gui2\convert\gui_conversion.py", line 32, in gui_convert_recipe
File "calibre\gui2\convert\gui_conversion.py", line 26, in gui_convert
File "calibre\ebooks\conversion\plumber.py", line 1089, in run
File "calibre\customize\conversion.py", line 242, in __call__
File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 153, in convert
File "calibre\web\feeds\news.py", line 1122, in download
File "calibre\web\feeds\news.py", line 1300, in build_index
File "<string>", line 226, in parse_index
File "<string>", line 199, in parse_todays_page
File "calibre\web\feeds\news.py", line 752, in index_to_soup
File "mechanize\_mechanize.py", line 241, in open_novisit
File "mechanize\_mechanize.py", line 313, in _mech_open
mechanize._response.get_seek_wrapper_class.<locals >.httperror_seek_wrapper: HTTP Error 403: Not Allowed, Forbidden, Bot Blocked
mr316 is offline   Reply With Quote
Old 11-11-2025, 11:25 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 46,198
Karma: 29626604
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yeah, that started today. I dont see an easy workaround however. They have decided to start blocking "bots". For a while pretending to be the wayback machine got past it, doesnt work anymore.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-14-2025, 03:13 AM   #3
philoufr
Member
philoufr began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2011
Device: Kindle Paperwhite
The New York Times Book Review recipe unfortunately is blocked, too.

Hoping you can find a workaround, as usual.
philoufr is offline   Reply With Quote
Old 11-15-2025, 11:18 AM   #4
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 336
Karma: 10143
Join Date: Dec 2009
Location: Kelowna BC
Device: Various
Interestingly, the New York Times articles are all available via archive.is. I'm pretty sure this is a result of archive.is scraping the nytimes website because it seems unlikely that individual users are archiving articles. Alternatively, perhaps there is a bot that isn't seen as a bot because it has an nytimes subscription and uploads everything daily.
nickredding is offline   Reply With Quote
Old 11-16-2025, 06:14 AM   #5
scottsan1
Junior Member
scottsan1 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2021
Device: iPad mini
also hoping for an eventual solution.
scottsan1 is offline   Reply With Quote
Advert
Old 11-24-2025, 05:39 PM   #6
bhartman36
Wizard
bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.bhartman36 ought to be getting tired of karma fortunes by now.
 
bhartman36's Avatar
 
Posts: 1,325
Karma: 1515835
Join Date: Mar 2009
Location: New Jersey, USA
Device: Kobo Libra Colour, Kindle Paperwhite Signature Edition (2021)
Hoping for a workaround here, too. I'll use Instapaper for now, but it would be nice to be able to download the whole paper in one shot.
bhartman36 is offline   Reply With Quote
Old 02-22-2026, 01:34 AM   #7
bllittle
Junior Member
bllittle began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2021
Device: Kindle app on android tablet
Any updates or work-arounds for this?
bllittle is offline   Reply With Quote
Old 03-22-2026, 06:56 PM   #8
jazzbox
Member
jazzbox doesn't litterjazzbox doesn't litter
 
Posts: 23
Karma: 190
Join Date: Nov 2017
Device: Kindle paperwhite
For the past week, the recipe has been pulling section headers (but only headers) but not failing. Might this be an opening?
jazzbox is offline   Reply With Quote
Old 03-22-2026, 10:43 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 46,198
Karma: 29626604
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You mean its downloading page titles or that its downloading the lsit of articles?
kovidgoyal is offline   Reply With Quote
Old 03-23-2026, 01:33 AM   #10
jazzbox
Member
jazzbox doesn't litterjazzbox doesn't litter
 
Posts: 23
Karma: 190
Join Date: Nov 2017
Device: Kindle paperwhite
Page titles: 'The Front Page' 'International' 'National' etc as headers for separate (otherwise blank) pages
jazzbox is offline   Reply With Quote
Old 03-23-2026, 07:22 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 46,198
Karma: 29626604
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah no that just means that the mytimes is returning captchas after the initial index download. Look at the download job log and you will see erorr messages about CAPTCHAs
kovidgoyal is offline   Reply With Quote
Old 04-04-2026, 06:47 PM   #12
audio_inside
Junior Member
audio_inside began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2026
Device: Kindle Oasis
Bypassing CAPTCHAS

OK, after the latest changes at NYTimes.com I was also getting only the front page and section indexes with no article content. But last night I managed to create a scheme and a recipe that got the current articles from the NYTimes successfully downloaded (and then transferred to my Kindle Oasis.)

The scheme first requires logging into NYTimes.com with my subscription and then manually extracting the session cookie "NYT-S" and the anti-scraper DataDome cookie "datadome". I leverage the Account pane in the recipe to hold these cookies and inject them in place of my login and password. These cookies should not need refreshing for awhile - "NYT-S" is only recreated on a new browser or after a logout and "datadome" depends on not arousing the suspicions of the Times' "intelligent" DataDome firewall so that it throws up a CAPTCHA.

The NYTimes also does not like Calibre's in-built headless browser, so I had to spin up a FlareSolverr instance in Docker on my server which exposes a Chrome browser to use as a proxy; I point the recipe to that browser on one of my server ports.

Also, to get this to work I had to restrict the recipe to only 1 download at a time to avoid arousing the suspicions of DataDome's anti-scraper algorithms. I also commented-out a number of NYTimes content sections that I'm not interested in and asked only for the articles from the last 24 hours to keep the download time reasonable. Even still, however, it required 2 hours to download the entire edition of the paper and it puts a fair CPU load onto my Celeron-based NAS server.

To reduce the download time I may try altering the way the FlareSolverr Chrome browser is instantiated so that it remains running between fetches instead of being re-started for each download.

Has anybody else tried any of these techniques to get past the current NYTimes.com roadblocks?
audio_inside is offline   Reply With Quote
Old 04-05-2026, 04:16 PM   #13
audio_inside
Junior Member
audio_inside began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2026
Device: Kindle Oasis
I've made a few changes to my recipe and how it accesses FlareSolverr, and I'm now getting a day and a half's worth of NYTimes articles (145 total) downloaded in an hour and 10 minutes. (I'm on a 300Mb/s connection.)

Big improvement on my original scheme! Just as a comparison point, what rates were people getting from the NYTimes before the last round of anti-scraper algorithms were deployed?

I am going to tweak it a little more but at this point I think my fetches are rate-limited by the Celeron processor in my Synology NAS, which is hitting 75%-85% CPU load during these downloads.

Last edited by audio_inside; 04-05-2026 at 04:18 PM.
audio_inside is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NYT Spanish New York Times Español El Times Recipe compa Recipes 0 03-24-2022 02:40 PM
New York Times Recipe dieterpops Recipes 1 01-20-2013 12:26 PM
Which New York Times recipe? jdomingos76 Recipes 1 03-25-2011 08:40 PM
Help - New York Times Recipe brutalist Recipes 6 03-20-2011 10:17 PM
New York Times recipe madrone26 Calibre 4 04-02-2009 01:13 PM


All times are GMT -4. The time now is 07:35 AM.


MobileRead.com is a privately owned, operated and funded community.