Hello, I am a newbie in Calibre.
I tested a script under python 2.7. Then I ported the main function into a calibre recipe. But but it doesn't work.
All the urls cannot be downloaded, But actually, these urls are alive. It shows like:
"Could not fetch link
http://www.lepoint.fr/municipales-20...94120_1966.php
Failed to download article: Municipales 2014 - Béziers : Aboud-Ménard, un duel au coude-à-coude from http://www.lepoint.fr/municipales-2014/municipales-2014-beziers-aboud-menard-un-duel-au-coude-a-coude-20-02-2014-1794120_1966.php"
I know I am stupid, but I need some help.
Code:
class AdvancedUserRecipe1392888003(BasicNewsRecipe):
title = u'Le Point'
oldest_article = 3
max_articles_per_feed = 10
auto_cleanup = True
feeds = [(u'Actualit\xe9', u'http://www.lepoint.fr/rss.xml')]
def preprocess_html(self, soup):
html = u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n <html xmlns="http://www.w3.org/1999/xhtml"> \n <head> \n <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />'
html += str(soup.title).decode('utf-8')+u'\n'
html += u'<body>'
article_title_meta = soup.find('meta', attrs={'property': 'og:title'})
article_title = article_title_meta['content']
html += u'<h1> ' + article_title + u'</h1>'
article_title_img_meta = soup.find('meta', attrs={'property': 'og:image'})
if article_title_img_meta:
article_title_img = article_title_img_meta['content']
html += u'<img src=' + article_title_img + u'>'
article_body = soup.find('span', attrs={'itemprop': 'articleBody'})
html += str(article_body).decode('utf-8')
return BeautifulSoup(html)