Originally Posted by kovidgoyal
Interesting what is the difficulty with caching? Presumably the algorithm would be something along the lines of user requests feed. Fetch feed.xml, see if copy is older than TTL if not return previously generated ebook, if yes re-generate. You should probably not use wget for fetching, as that must impose a lot of extra process overhead on the server.
Problem is not with caching the feeds themselves, but the images. Right now, what's slowing us down is the need to download the images before generating the files. Wget is not for caching the feeds but the images themselves. Updating the feeds when someone ask for them isn't the best idea either, if there's images in this feed, the user will have to wait much longer. The best thing is to automatically update the cache in the background.
And you can hardly cache the generated files themselves in this situation: most people create customized newspapers with multiple feeds in them.
Once we have the feeds and the images in cache, generating a file out of it is pretty easy and fast (yes, unlike a software, people expect a website to be fast, 10-20s top).