Thread: NY Times fails
View Single Post
Old 11-18-2022, 01:34 PM   #42
mkgtu
Zealot
mkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolatemkgtu is generous with chocolate
 
Posts: 143
Karma: 33000
Join Date: Feb 2010
Device: Currently:Voyage, Oasis 3, Kindle mobile apps, andKindle Fire
I checked the Internet Archive website and found 11 "snapshots". Any of the 10 from 3am forward gave a pretty up to date version of today's edition.

But if you click on the "Today's Paper" link at the top of the page you get a TWO DAY OLD PAPER. I'm not really good at reading these recipes, but it looks to my untrained eye like the current recipe is asking for Today's Paper:


def read_todays_paper(self):
INDEX = 'https://www.nytimes.com/section/todayspaper'
# INDEX = 'file:///t/raw.html'
return self.index_to_soup(self.get_nyt_page(INDEX))

If that's the case, and since it looks like Today's Paper in the Internet Archive is out of date (at least today it is), maybe the recipe shouldn't be asking for it. Might be better off with just the most recent "snapshot" at the time of download.
mkgtu is offline   Reply With Quote