View Single Post
Old 11-24-2013, 03:37 PM   #1
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Question Article download fails if I use postprocess_html function

Hi,

sorry to bug you with a probably stupid error. But here's my problem:

I would like to postprocess the html code of the downloaded articles using this function (it should only deliver debugging messages for now):

Code:
    def postprocess_html(self,soup,first):
    	     		     	     	 
    	 self.log('===== post process article');
         for tmp_link in soup.findAll('a',href=re.compile("[gtrj][0-9]+....html")):
           dummy=0;
           self.log('\t\t ====== found link: ' + self.tag_to_string(tmp_link).get('href'))
    	   
    	 return soup
This code should not corrupt the soup object in any way. However, I get an error message "Article download failed: <article url>". Strangely, enough if I remove the self.log command in the loop all articles are downloaded without any problem.

Very strange indeed.

Does anybody have an idea where I am wrong?

Thanks,

Jens
oecherprinte is offline   Reply With Quote