Actually I've been thinking about doing something vaguely like this. I recently saw a set of scripts someone put together to pull wikipedia content together (kind of like web spidering starting with a small set of articles) and putting this on an iPod. I thought that would be cool to do, and ultimately should be generalized. One approach wrt RSS would be to flag each feed as being complete or partial, and in the case of partial ones following the links to the full text. This might require some addtional configuration to know how to extract the signal from the noise of the full pages but in many cases this could be done just scanning for appropriate DIV tag class attributes.
|