View Single Post
Old 09-29-2010, 06:46 AM   #1
kinurev
Junior Member
kinurev began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2010
Location: Brisbane, AU
Device: Kindle
Dealing with double quotes " in URL

Hi guys,

I am totally new to recipes. Last night I tried to create a recipe to fetch Vietnamese news from this website http://tuoitre.vn/Rss/Index.html

I think the recipe works fine until:

Quote:
Could not fetch link http://tuoitre.vn/Van-hoa-Giai-tri/4...u-thuong”.html
Traceback (most recent call last):
File "site-packages/calibre/web/fetch/simple.py", line 422, in process_links
File "site-packages/calibre/web/fetch/simple.py", line 221, in fetch_url
FetchError: Bad Request
My guess of the culprit would be the double quote character in the URL. Can any of you please help me with this? Thanks a lot.

Below is my recipe:

Code:
import re
from calibre.web.feeds.recipes import BasicNewsRecipe

class AdvancedUserRecipe1285594488(BasicNewsRecipe):
	title = u'Tuoi Tre News'
	__author__             = 'kinurev'
	description = 'News from Tuoitre in Vietnamese. '
	timefmt = ' [%a, %d %b, %Y]'
	oldest_article = 7
	max_articles_per_feed = 20
	no_stylesheets         = True
	#delay                  = 1
	use_embedded_content   = False
	encoding               = 'utf8'
	publisher              = 'Tuoitre'
	category               = 'news, Vietnam'
	language               = 'vi'
	publication_type       = 'newsportal'
	extra_css              = 'body{font-family: Verdana, Helvetica, Arial, sans-serif} .pHead{ font-size: medium; color: #5F5F5F; font-weight: bold } .pTitle{ font-size: large; font-weight: bold; margin-top: 0 }'
	preprocess_regexps = [
							(re.compile(r'<P class=pBody>------------------------------.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</body>'),
						]
	remove_tags_before = dict(id='divContent')
	remove_tags_after = dict(id='divContent')
	remove_attributes = ['width','height']

	feeds          = [
						(u'Ch\xednh tr\u1ecb  - X\xe3 h\u1ed9i', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=3'), 
						(u'Th\u1ebf gi\u1edbi', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=2'), 
						(u'Nh\u1ecbp s\u1ed1ng tr\u1ebb', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=7'), 
						(u'Gi\xe1o d\u1ee5c', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=13'), 
						(u'Th\u1ec3 thao', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=14'), 
						(u'V\u0103n h\xf3a  - Gi\u1ea3i tr\xed', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=10'), 
						(u'Nh\u1ecbp s\u1ed1ng s\u1ed1', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=16')
					]
kinurev is offline   Reply With Quote