04-11-2011, 04:08 PM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
|
Need Help with Recipe
Hello. I'm trying to get a recipe created for KSL. I need a bit of help and it looks like this is the spot that's giving me the trouble:
The CSS loads the page and each of the news items is under a div labled: <div class="headlineQueueItem">. Well the problem is each of the links to the actual newstory use the tiniest bit of Javascript via an anchor tag <a ....?>. Please see below <a onclick="s_objectID='Latest Local News 1 title'" href="?nid=148&sid=15103589">Alta High assistant principal takes different job</a> All this really does is makes it so the part after the question mark in href"?' is pasted after the text: 'http://www.ksl.com/index.php?' as shown below http://www.ksl.com/index.php?nid=148&sid=15103589 Is there any way to make python turn the href="?nid=148&sid=15103589"="? into href="http://www.ksl.com/index.php?nid=148&sid=15103589" so I don't get deadlinks when the recipe is downloaded? Thanks in advance. James P.S. - Here's my recipe in case this is helpful: Code:
class AdvancedUserRecipe1300058293(BasicNewsRecipe): title = u'KSL' oldest_article = 1 max_articles_per_feed = 20 remove_tags_after = dict(name='div',attrs={'id':'bodyCol1'}), keep_only_tags = [dict(name='div',attrs={'id':'bodyBlock'})] remove_tags = [ dict(name='table',attrs={'class':'siteIndex'}), dict(name='div',attrs={'class':'roundColWide'}), dict(name='div',attrs={'id':'bodyCol2'}), dict(name='div',attrs={'id':'bodyCol3'}), dict(name='div',attrs={'class':'addthis_toolbox addthis_default_style'}), dict(name='embed',attrs={'id':'p1'}), dict(name=['script', 'noscript', 'style']), ] feeds = [ (u'Local News and Features', u'http://www.ksl.com/xml/148.rss'), (u'Consumer News', u'http://www.ksl.com/xml/172.rss'), ] Last edited by kovidgoyal; 04-11-2011 at 04:46 PM. |
04-12-2011, 09:50 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I'd use preprocess_regexps to change the <a> tag.
http://calibre-ebook.com/user_manual...rocess_regexps Edit: Actually, if that didn't work, I'd switch to preprocess_html and run a regex on the <a> tag. I can't recall if preprocess_regexps runs early enough in the process. And I'm not totally sure where your problem is - if it's in the RSS feed, then you'll need to work even earlier in the process. I'd do that by grabbing the feed page with parse_index and regex fixing the <a> links. Last edited by Starson17; 04-12-2011 at 09:56 AM. |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
Recipe Please | gagw | Recipes | 0 | 01-24-2011 07:24 AM |
recipe please | Torx | Recipes | 0 | 01-22-2011 12:18 PM |
Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 10:42 PM |
Recipe Help Please | estral | Calibre | 1 | 06-11-2009 02:35 PM |