MobileRead Forums - View Single Post - Article download fails if I use postprocess_html function

oecherprinte · 11-24-2013, 03:37 PM

Hi,

sorry to bug you with a probably stupid error. But here's my problem:

I would like to postprocess the html code of the downloaded articles using this function (it should only deliver debugging messages for now):

Code:

    def postprocess_html(self,soup,first):
    	     		     	     	 
    	 self.log('===== post process article');
         for tmp_link in soup.findAll('a',href=re.compile("[gtrj][0-9]+....html")):
           dummy=0;
           self.log('\t\t ====== found link: ' + self.tag_to_string(tmp_link).get('href'))
    	   
    	 return soup

This code should not corrupt the soup object in any way. However, I get an error message "Article download failed: <article url>". Strangely, enough if I remove the self.log command in the loop all articles are downloaded without any problem.

Very strange indeed.

Does anybody have an idea where I am wrong?

Thanks,

Jens

11-24-2013, 03:37 PM	#1
oecherprinte Zealot Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2	Article download fails if I use postprocess_html function Hi, sorry to bug you with a probably stupid error. But here's my problem: I would like to postprocess the html code of the downloaded articles using this function (it should only deliver debugging messages for now): Code: def postprocess_html(self,soup,first): self.log('===== post process article'); for tmp_link in soup.findAll('a',href=re.compile("[gtrj][0-9]+....html")): dummy=0; self.log('\t\t ====== found link: ' + self.tag_to_string(tmp_link).get('href')) return soup This code should not corrupt the soup object in any way. However, I get an error message "Article download failed: <article url>". Strangely, enough if I remove the self.log command in the loop all articles are downloaded without any problem. Very strange indeed. Does anybody have an idea where I am wrong? Thanks, Jens