Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-11-2011, 04:08 PM   #1
UtahJames
Junior Member
UtahJames began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
Need Help with Recipe

Hello. I'm trying to get a recipe created for KSL. I need a bit of help and it looks like this is the spot that's giving me the trouble:

The CSS loads the page and each of the news items is under a div labled:
<div class="headlineQueueItem">. Well the problem is each of the links to the actual newstory use the tiniest bit of Javascript via an anchor tag <a ....?>. Please see below

<a onclick="s_objectID='Latest Local News 1 title'" href="?nid=148&amp;sid=15103589">Alta High assistant principal takes different job</a>

All this really does is makes it so the part after the question mark in
href"?' is pasted after the text: 'http://www.ksl.com/index.php?' as shown below
http://www.ksl.com/index.php?nid=148&sid=15103589

Is there any way to make python turn the href="?nid=148&amp;sid=15103589"="? into href="http://www.ksl.com/index.php?nid=148&amp;sid=15103589" so I don't get deadlinks when the recipe is downloaded?

Thanks in advance.

James

P.S. - Here's my recipe in case this is helpful:
Code:
class AdvancedUserRecipe1300058293(BasicNewsRecipe):
    title          = u'KSL'
    oldest_article = 1
    max_articles_per_feed = 20

    remove_tags_after  = dict(name='div',attrs={'id':'bodyCol1'}),

    keep_only_tags = [dict(name='div',attrs={'id':'bodyBlock'})]
    remove_tags    = [
        dict(name='table',attrs={'class':'siteIndex'}),
        dict(name='div',attrs={'class':'roundColWide'}),
        dict(name='div',attrs={'id':'bodyCol2'}),
        dict(name='div',attrs={'id':'bodyCol3'}),
        dict(name='div',attrs={'class':'addthis_toolbox addthis_default_style'}),
        dict(name='embed',attrs={'id':'p1'}),
        dict(name=['script', 'noscript', 'style']),
        ]

    feeds          = [
        (u'Local News and Features', u'http://www.ksl.com/xml/148.rss'),
        (u'Consumer News', u'http://www.ksl.com/xml/172.rss'),
        ]

Last edited by kovidgoyal; 04-11-2011 at 04:46 PM.
UtahJames is offline   Reply With Quote
Old 04-12-2011, 09:50 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I'd use preprocess_regexps to change the <a> tag.

http://calibre-ebook.com/user_manual...rocess_regexps

Edit: Actually, if that didn't work, I'd switch to preprocess_html and run a regex on the <a> tag. I can't recall if preprocess_regexps runs early enough in the process.

And I'm not totally sure where your problem is - if it's in the RSS feed, then you'll need to work even earlier in the process. I'd do that by grabbing the feed page with parse_index and regex fixing the <a> links.

Last edited by Starson17; 04-12-2011 at 09:56 AM.
Starson17 is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
Recipe Please gagw Recipes 0 01-24-2011 07:24 AM
recipe please Torx Recipes 0 01-22-2011 12:18 PM
Recipe Help lrain5 Calibre 3 05-09-2010 10:42 PM
Recipe Help Please estral Calibre 1 06-11-2009 02:35 PM


All times are GMT -4. The time now is 05:48 PM.


MobileRead.com is a privately owned, operated and funded community.