![]() |
#1 | |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Sep 2010
Location: Brisbane, AU
Device: Kindle
|
Dealing with double quotes " in URL
Hi guys,
I am totally new to recipes. Last night I tried to create a recipe to fetch Vietnamese news from this website http://tuoitre.vn/Rss/Index.html I think the recipe works fine until: Quote:
Below is my recipe: Code:
import re from calibre.web.feeds.recipes import BasicNewsRecipe class AdvancedUserRecipe1285594488(BasicNewsRecipe): title = u'Tuoi Tre News' __author__ = 'kinurev' description = 'News from Tuoitre in Vietnamese. ' timefmt = ' [%a, %d %b, %Y]' oldest_article = 7 max_articles_per_feed = 20 no_stylesheets = True #delay = 1 use_embedded_content = False encoding = 'utf8' publisher = 'Tuoitre' category = 'news, Vietnam' language = 'vi' publication_type = 'newsportal' extra_css = 'body{font-family: Verdana, Helvetica, Arial, sans-serif} .pHead{ font-size: medium; color: #5F5F5F; font-weight: bold } .pTitle{ font-size: large; font-weight: bold; margin-top: 0 }' preprocess_regexps = [ (re.compile(r'<P class=pBody>------------------------------.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</body>'), ] remove_tags_before = dict(id='divContent') remove_tags_after = dict(id='divContent') remove_attributes = ['width','height'] feeds = [ (u'Ch\xednh tr\u1ecb - X\xe3 h\u1ed9i', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=3'), (u'Th\u1ebf gi\u1edbi', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=2'), (u'Nh\u1ecbp s\u1ed1ng tr\u1ebb', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=7'), (u'Gi\xe1o d\u1ee5c', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=13'), (u'Th\u1ec3 thao', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=14'), (u'V\u0103n h\xf3a - Gi\u1ea3i tr\xed', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=10'), (u'Nh\u1ecbp s\u1ed1ng s\u1ed1', u'http://tuoitre.vn/RssFeeds.aspx?ChannelID=16') ] |
|
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
|
The usual way of dealing with it would be to use " in place of the double-quote. But I'm not a Calibre expert, so can't be sure if it would work in this case.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
maybe this will work:
Code:
return url.replace('\"', '\%22) |
![]() |
![]() |
![]() |
#4 | ||
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Sep 2010
Location: Brisbane, AU
Device: Kindle
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#5 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Maybe Kovid or Starson or someone else will chime in and answer this for you and I. I don't see why the below doesn't work but that's not saying it does either.
Spoiler:
Basically in the above it SHOULD look for all anchor tags (links) in your soup and then do a regexpression lookup for all instances of " insider the href reference. If it find it replace that value with %22 which is html for a double quote. Again this may not work but I didn't really have anything to test it on other than your code but the code didn't generate any links that had " in it so I wasn't really able to test it. Give a shot and see what happens for you. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
|
Did you try my suggestion of using " I don't know if it will work, but surely it's worth a try.
|
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Sep 2010
Location: Brisbane, AU
Device: Kindle
|
Thanks TonytheBookworm for helping me with this. The script seems to work for now but like you said, just when I need an url with double quotes to try, I could not find one.
Well, good news, while writing this I found out that the link http://tuoitre.vn/Chinh-tri-Xa-hoi/4...-cay-canh.html and the link http://tuoitre.vn/Chinh-tri-Xa-hoi/403734/Kiem-lam-va-cong-an-"canh-giu"-doan-xe-tai-cho-cay-canh.html both worked in my browser (Chrome), and that the script worked fine irrespective of the code you suggested. It seems that the problem solved itself (hopefully for good). I honestly don't know how it happened but thanks a lot for your help anyway. I'll still keep your code in the script, just in case. @Mike L: thanks for your suggestion as well but I have very little knowledge about python so I just don't know how to use ". Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Change single quotes to double quotes | Elfwreck | Workshop | 16 | 04-26-2013 10:06 AM |
Single quotes to double quotes? | lunixer | General Discussions | 35 | 10-10-2010 05:47 AM |
0.7.7 converts double "l's" to single | stan1 | Calibre | 3 | 07-06-2010 03:03 AM |
PRS-600 "double tap" bookmark not working | MO74 | Sony Reader | 3 | 03-24-2010 05:24 AM |
Sony's "Connect" Store changes URL | NatCh | Sony Reader | 3 | 01-15-2008 06:34 PM |