I am trying to write down a simple applet for web2lrf/libprs500, to download the magazine the Atlantic (
http://www.theatlantic.com/) - it is free since today...
damn, I dont know python so I have a couple of problems...
1) under
http://www.theatlantic.com/doc/current, all the links are relative (e.g. <a href="/doc/200801/millbank">), so I began with:
preprocess_regexps = [(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
(r'<a href="/', lambda match : match.group().replace(match.group(1), '<a href="http://www.theatlantic.com')),
]
]
... is it right?
2) at the end of every run I get the error (freely translated by me: italian windows version!)
Exception exceptions.WindowsError: WindowsError(32, 'Impossible to access the file. File is used by another process') in <bound method atlantic.__de
l__ of <atlantic.atlantic object at 0x0111A690>> ignored
I add that I get this error even under other scripts I tried to write for other newspapers, but this didnt prevent an LRF output to be written.
In this case instead, the LRF just contains the header and nothing else - probably it has something to do with question 1)...
any idea?
Alessandro