Hi,
sorry to bug you with a probably stupid error. But here's my problem:
I would like to postprocess the html code of the downloaded articles using this function (it should only deliver debugging messages for now):
Code:
def postprocess_html(self,soup,first):
self.log('===== post process article');
for tmp_link in soup.findAll('a',href=re.compile("[gtrj][0-9]+....html")):
dummy=0;
self.log('\t\t ====== found link: ' + self.tag_to_string(tmp_link).get('href'))
return soup
This code should not corrupt the soup object in any way. However, I get an error message "Article download failed: <article url>". Strangely, enough if I remove the self.log command in the loop all articles are downloaded without any problem.
Very strange indeed.
Does anybody have an idea where I am wrong?
Thanks,
Jens