|
|
#1 |
|
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
|
Need Help with Recipe
Hello. I'm trying to get a recipe created for KSL. I need a bit of help and it looks like this is the spot that's giving me the trouble:
The CSS loads the page and each of the news items is under a div labled: <div class="headlineQueueItem">. Well the problem is each of the links to the actual newstory use the tiniest bit of Javascript via an anchor tag <a ....?>. Please see below <a onclick="s_objectID='Latest Local News 1 title'" href="?nid=148&sid=15103589">Alta High assistant principal takes different job</a> All this really does is makes it so the part after the question mark in href"?' is pasted after the text: 'http://www.ksl.com/index.php?' as shown below http://www.ksl.com/index.php?nid=148&sid=15103589 Is there any way to make python turn the href="?nid=148&sid=15103589"="? into href="http://www.ksl.com/index.php?nid=148&sid=15103589" so I don't get deadlinks when the recipe is downloaded? Thanks in advance. James P.S. - Here's my recipe in case this is helpful: Code:
class AdvancedUserRecipe1300058293(BasicNewsRecipe):
title = u'KSL'
oldest_article = 1
max_articles_per_feed = 20
remove_tags_after = dict(name='div',attrs={'id':'bodyCol1'}),
keep_only_tags = [dict(name='div',attrs={'id':'bodyBlock'})]
remove_tags = [
dict(name='table',attrs={'class':'siteIndex'}),
dict(name='div',attrs={'class':'roundColWide'}),
dict(name='div',attrs={'id':'bodyCol2'}),
dict(name='div',attrs={'id':'bodyCol3'}),
dict(name='div',attrs={'class':'addthis_toolbox addthis_default_style'}),
dict(name='embed',attrs={'id':'p1'}),
dict(name=['script', 'noscript', 'style']),
]
feeds = [
(u'Local News and Features', u'http://www.ksl.com/xml/148.rss'),
(u'Consumer News', u'http://www.ksl.com/xml/172.rss'),
]
Last edited by kovidgoyal; 04-11-2011 at 05:46 PM. |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I'd use preprocess_regexps to change the <a> tag.
http://calibre-ebook.com/user_manual...rocess_regexps Edit: Actually, if that didn't work, I'd switch to preprocess_html and run a regex on the <a> tag. I can't recall if preprocess_regexps runs early enough in the process. And I'm not totally sure where your problem is - if it's in the RSS feed, then you'll need to work even earlier in the process. I'd do that by grabbing the feed page with parse_index and regex fixing the <a> links. Last edited by Starson17; 04-12-2011 at 10:56 AM. |
|
|
|
| Advert | |
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 05:57 AM |
| Recipe Please | gagw | Recipes | 0 | 01-24-2011 08:24 AM |
| recipe please | Torx | Recipes | 0 | 01-22-2011 01:18 PM |
| Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 11:42 PM |
| Recipe Help Please | estral | Calibre | 1 | 06-11-2009 03:35 PM |