Hi guys,
I am totally new to recipes. Last night I tried to create a recipe to fetch Vietnamese news from this website
http://tuoitre.vn/Rss/Index.html
I think the recipe works fine until:
Quote:
Could not fetch link http://tuoitre.vn/Van-hoa-Giai-tri/4...u-thuong”.html
Traceback (most recent call last):
File "site-packages/calibre/web/fetch/simple.py", line 422, in process_links
File "site-packages/calibre/web/fetch/simple.py", line 221, in fetch_url
FetchError: Bad Request
|
My guess of the culprit would be the double quote character in the URL. Can any of you please help me with this? Thanks a lot.
Below is my recipe:
Code:
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class AdvancedUserRecipe1285594488(BasicNewsRecipe):
title = u'Tuoi Tre News'
__author__ = 'kinurev'
description = 'News from Tuoitre in Vietnamese. '
timefmt = ' [%a, %d %b, %Y]'
oldest_article = 7
max_articles_per_feed = 20
no_stylesheets = True
#delay = 1
use_embedded_content = False
encoding = 'utf8'
publisher = 'Tuoitre'
category = 'news, Vietnam'
language = 'vi'
publication_type = 'newsportal'
extra_css = 'body{font-family: Verdana, Helvetica, Arial, sans-serif} .pHead{ font-size: medium; color: #5F5F5F; font-weight: bold } .pTitle{ font-size: large; font-weight: bold; margin-top: 0 }'
preprocess_regexps = [
(re.compile(r'<P class=pBody>------------------------------.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</body>'),
]
remove_tags_before = dict(id='divContent')
remove_tags_after = dict(id='divContent')
remove_attributes = ['width','height']
feeds = [
(u'Ch\xednh tr\u1ecb - X\xe3 h\u1ed9i', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=3'),
(u'Th\u1ebf gi\u1edbi', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=2'),
(u'Nh\u1ecbp s\u1ed1ng tr\u1ebb', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=7'),
(u'Gi\xe1o d\u1ee5c', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=13'),
(u'Th\u1ec3 thao', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=14'),
(u'V\u0103n h\xf3a - Gi\u1ea3i tr\xed', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=10'),
(u'Nh\u1ecbp s\u1ed1ng s\u1ed1', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=16')
]