![]() |
#1 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
|
New York Times - no articles
Just section headings. No articles downloaded.
Calibre 6.8 The log is from using built in recipe, today Nov 13, 2022 EDIT: Sorry, but I've deleted most of the log text from the original post because it was making the post HUGE. Besides Kovid has already seen it and addressed the issue by adding a delay to the recipe to try to avoid NYTimes bot detection which seemed to be causing the problem. (I've just left a small snippet of the log below) I'm sure if you experience the same issue you can check your own logs. And BTW. This happens with all NYTimes recipes: Daily, Web version, MYTimes Book Review Failed to download article: 6 Dead After Planes Collide in Midair at Dallas Air Show, Official Says from https://www.nytimes.com/2022/11/12/u...sh-dallas.html Fetching https://www.nytimes.com/2022/11/11/a...tman-dead.html Traceback (most recent call last): File "calibre\utils\threadpool.py", line 99, in run File "calibre\web\feeds\news.py", line 1185, in fetch_article File "calibre\web\feeds\news.py", line 1180, in _fetch_article Exception: Could not fetch article. The debug traceback is available earlier in this log Last edited by mkgtu; 11-14-2022 at 07:50 PM. Reason: shorten length of post |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
I'm having the same problem with the Times, and also the NY Times Book Review as well.
Let me add another one: NY Times Sports Beat. Last edited by NSILMike; 11-13-2022 at 01:56 PM. Reason: Uodate |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#4 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
|
Tried adding the "delay" line from the recipe on github in the link above.
One time the paper download with articles in the "Front Page" section but all other sections were empty (headings only) Tried several more times and all sections come up empty. Tried changing the delay to 2 seconds, then 3 secs, then 5 secs. Still no articles. Maybe I'm doing something wrong. This is the relevant section of my "customized recipe" with the delay at line 94: class NewYorkTimes(BasicNewsRecipe): if is_web_edition: title = 'The New York Times (Web)' description = 'New York Times (Web). You can edit the recipe to remove sections you are not interested in.' else: title = 'NY Times ***' description = 'Today\'s New York Times' encoding = 'utf-8' __author__ = 'Kovid Goyal' language = 'en' ignore_duplicate_articles = {'title', 'url'} no_stylesheets = True compress_news_images = True compress_news_images_auto_size = 5 conversion_options = {'flow_size': 0} delay = 0 if use_wayback_machine else 1 extra_css = ''' body {text-align: left} ''' @property def nyt_parser(self): ans = getattr(self, '_nyt_parser', None) |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you need to wait or use a new ip address as yours will likely have been blocked for a time.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
|
OK.
I just turned on a VPN and started the NYTimes download again.. It seems to be downloading lots of content. Not finished yet. Might be my fault. On my last attempt I had changed the delay to 5. I'll have to go back and change it to 1 again. I wonder if there's a recipe around that adds a subscription login. Or code that could be added to a custom recipe to include login. And would that circumvent any bot issues? I have a NYT digital subscription, so i have all the apps and website access. But that doesn't include the Kindle version, which is the device I prefer to use over coffee at my local cafe in the morning. I hate to pay another $20 for the Kindle sub. |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No, IIRC NYT does bot detection even for logged in fetches.
|
![]() |
![]() |
![]() |
#8 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
|
The one second delay is likely not enough. (This is California, USA)
Calibre on my PC is currently trying to download NYT (probably unsuccessfully! I haven't looked) But I also just tried to open a NYT article using a weblink in an email in my phone (on same WiFi, IP address as my PC) and got the attached notice that I am a suspected BOT because of some "too fast for a human" clicking coming from this IP address. I switched my phone to a VPN and the article opened without a problem. Also turned off WiFi and successfully used Verizon mobile data connection. So I guess I'll need to experiment with delay settings. Hope it doesn't have to be too long. The download happens at 6am daily. If downloads start taking too long I guess I can move it back to 5am. ![]() Sent from my SM-G975U using Tapatalk |
![]() |
![]() |
![]() |
#9 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
|
Tried 2 second delay, new IP, only got part of first section before failure.
Tried 5 secs with a different VPN server, got several partial sections, the rest empty. 10 secs wasn't much better Anyway, skipping to the end.... Had full success, all 10 sections from today fully downloaded... using 15 second delay, on yet another vpn from a different provider (Bitdefender) which may have been a slower server. Took 41 minutes to download. So my game plan for tomorrow is 15 second delay and auto-start the download an hour earlier (5am instead of 6am) and leave the VPN on overnight so that if I get CAUGHT it won't mess with using my phone for NYT articles (for which I have a paid subscription) I should also note that with the last 15 sec delay attempt I had also commented out 8 of the sections, leaving only 12 available for possible download. Note: all NYT sections are published every day. Today there were 10. Last edited by mkgtu; 11-15-2022 at 11:47 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
The New York Times misses articles | danhotchkiss | Recipes | 4 | 11-08-2018 08:45 AM |
New York Times articles include extraneous text | nelson1379 | Recipes | 5 | 11-06-2016 10:46 AM |
New York Times recipe missing articles occasionally | nelson1379 | Recipes | 3 | 02-27-2016 11:01 AM |
New York Times recipe skipping some articles? | gianfri | Recipes | 20 | 02-18-2012 03:29 AM |
(another) FIX: New York Times Missing Articles | bcollier | Recipes | 11 | 02-11-2011 03:16 PM |