View Single Post
Old 09-01-2005, 08:10 AM   #3
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Quote:
Originally Posted by Chaos
Just be sure to use the caching options, as most websites don't like having their RSS feeds hit more than once every 30 minutes (Slashdot, in particular, is very strict about this.)
Ironic you should mention that... I'm writing a paper that is titled "Why RSS Sucks", and is specifically targeted at the 99% of users and feed readers that ignore the entire point of RSS.. to syndicate news. The paper goes through the various types of RSS (there are 7 formats), and how they're delivered (details on each header, etc.) and how 99% of the readers out there completely misbehave when revisiting feeds.

We are seeing roughly 800 separate readers and agents hitting our rss feeds every hour even though the content on those feeds hasn't changed in weeks. Not only does the "Expires:" header specify a 2-week revisit interval but the ETag hasn't changed at all in months for some of the stale news articles.

And yet these readers still pound the feed almost every hour. I've just started blocking them en-masse. If they can't learn to use the standards, they can use them on someone else's pipe.

Code:
# iptables-save | grep "dport 80" | wc -l
     622
(Some of these are entire /24 CIDRs)
hacker is offline   Reply With Quote