Hello. I'm trying to get a recipe created for KSL. I need a bit of help and it looks like this is the spot that's giving me the trouble:
The CSS loads the page and each of the news items is under a div labled:
<div class="headlineQueueItem">. Well the problem is each of the links to the actual newstory use the tiniest bit of Javascript via an anchor tag <a ....?>. Please see below
<a onclick="s_objectID='Latest Local News 1 title'" href="?nid=148&sid=15103589">Alta High assistant principal takes different job</a>
All this really does is makes it so the part after the question mark in
href"?' is pasted after the text: 'http://www.ksl.com/index.php?' as shown below
http://www.ksl.com/index.php?nid=148&sid=15103589
Is there any way to make python turn the href="?nid=148&sid=15103589"="? into href="http://www.ksl.com/index.php?nid=148&sid=15103589" so I don't get deadlinks when the recipe is downloaded?
Thanks in advance.
James
P.S. - Here's my recipe in case this is helpful:
Code:
class AdvancedUserRecipe1300058293(BasicNewsRecipe):
title = u'KSL'
oldest_article = 1
max_articles_per_feed = 20
remove_tags_after = dict(name='div',attrs={'id':'bodyCol1'}),
keep_only_tags = [dict(name='div',attrs={'id':'bodyBlock'})]
remove_tags = [
dict(name='table',attrs={'class':'siteIndex'}),
dict(name='div',attrs={'class':'roundColWide'}),
dict(name='div',attrs={'id':'bodyCol2'}),
dict(name='div',attrs={'id':'bodyCol3'}),
dict(name='div',attrs={'class':'addthis_toolbox addthis_default_style'}),
dict(name='embed',attrs={'id':'p1'}),
dict(name=['script', 'noscript', 'style']),
]
feeds = [
(u'Local News and Features', u'http://www.ksl.com/xml/148.rss'),
(u'Consumer News', u'http://www.ksl.com/xml/172.rss'),
]