MobileRead Forums - View Single Post

rhuang76 · 02-20-2014, 07:25 AM

Hello, I am a newbie in Calibre.

I tested a script under python 2.7. Then I ported the main function into a calibre recipe. But but it doesn't work.

All the urls cannot be downloaded, But actually, these urls are alive. It shows like:
"Could not fetch link http://www.lepoint.fr/municipales-20...94120_1966.php
Failed to download article: Municipales 2014 - Béziers : Aboud-Ménard, un duel au coude-à-coude from http://www.lepoint.fr/municipales-2014/municipales-2014-beziers-aboud-menard-un-duel-au-coude-a-coude-20-02-2014-1794120_1966.php"

I know I am stupid, but I need some help.

Code:

class AdvancedUserRecipe1392888003(BasicNewsRecipe):
    title          = u'Le Point'
    oldest_article = 3
    max_articles_per_feed = 10
    auto_cleanup = True

    feeds          = [(u'Actualit\xe9', u'http://www.lepoint.fr/rss.xml')]


    def preprocess_html(self, soup):
        html = u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n <html xmlns="http://www.w3.org/1999/xhtml"> \n  <head> \n <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />'
        html += str(soup.title).decode('utf-8')+u'\n'
        html += u'<body>' 

        article_title_meta = soup.find('meta', attrs={'property': 'og:title'})  
        article_title = article_title_meta['content']
        html += u'<h1> ' + article_title + u'</h1>'

        article_title_img_meta = soup.find('meta', attrs={'property': 'og:image'})
        if article_title_img_meta:
            article_title_img = article_title_img_meta['content']

        html += u'<img src=' + article_title_img + u'>'

        article_body = soup.find('span', attrs={'itemprop': 'articleBody'})
        html += str(article_body).decode('utf-8')
            
        return BeautifulSoup(html)