Quote:
Originally Posted by nickredding
True, but the article content is encapsulated in JSON within the article content so fiddling the UA is not necessary--the JSON is there with a standard UA header.
It seems that major publications are moving to a Content Management System that does this JSON encapsulation as a method of defeating simple screen scrapers. It also seems that RSS feeds are going out of style and relying on RSS in the future for indexes will be increasingly unreliable.
|
that's irrelevant, you will get a captcha page if you try to download without an appropriate user-agent. nytimes uses captcha-delivery.com on its article pages.