View Single Post
Old 11-30-2013, 11:20 PM   #1
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Information Week still not fully working

I've been trying to fix this recipe of mine for a while now, but still can't get it to pull multipage articles. At least now it's pulling first page and letting me go to the other pages.

My recipe

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class InformationWeek(BasicNewsRecipe):
    title          = u'InformationWeek'
    oldest_article = 6
    max_articles_per_feed = 150
    auto_cleanup = True
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True
    remove_javascript = False
    use_embedded_content   = True
    recursions = 1
    match_regexps = [r'page_number=[0-9]+']
    

    feeds          = [
                          (u'InformationWeek - Stories', u'www.informationweek.com/rss_feeds.asp'),
                          (u'InformationWeek - Software', u'http://www.informationweek.com/rss_simple.asp?f_n=476&f_ln=Software'),
                     ]

    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'healthcare' in article.title or 'healthcare' in article.url:
            feed.articles.remove(article)
      return feeds
the area of an article that shows (in bold) the language for the next page

Spoiler:
<div class="divsplitter" style="height: 1.25em;"></div><div style="height:
1.666em;"><div style="float: right;"><span class="smaller blue"><img src="
http://img.deusm.com/informationweek/slideshow-arrow-gray-left.png"
alt="Previous" style="width: 1.666em; height: 1.666em; border: 0; float: left;
margin-right: 0.666em;" /><div style="float: left; height: 1.416666em; padding-
top: .25em;">1 of 3</div><a href="
http://www.informationweek.com/mobil...pping-guide-8-
tips/d/d-id/1112842?page_number=2" title="Next" ><img src="
http://img.deusm.com/informationweek/slideshow-arrow-black-right.png" alt="Next"
style="width: 1.666em; height: 1.666em; border: 0; float: right; margin-left:
0.666em;" /></a></span>
</div></div><div class="divsplitter" style="height:
.666em;"></div><div style="float: left; margin-right: 2px;"><span class="smaller
blue allcaps"><a href="#msgs">Comment</a> &nbsp;|&nbsp;</span></div><div
style="float: left; margin-right: 2px;"><span class="smaller blue allcaps"><a
href="email.asp"
onclick="window.open('/email.asp?url='+encodeURIComponent(thispage_sharel ink)
+'&title='+encodeURIComponent(document.title),'',' '); return false;">Email
This</a> &nbsp;|&nbsp;</span></div><div style="float: left; margin-right: 2px;">
<span class="smaller blue allcaps"><a href="/mobile/mobile-devices/tablet-
shopping-guide-8-tips/d/d-id/1112842?print=yes">Print</a> &nbsp;|&nbsp;</span>
</div><div style="float: left; margin-right: 2px;"><span class="smaller blue
allcaps"><a href="http://www.informationweek.com/rss_simple.asp">RSS</a></span>
</div><div class="divsplitter" style="height: .666em;"></div><div
class="divsplitter" style="height: 4px; background: #aaa;"></div><div
class="divsplitter" style="height: .666em;"></div><div id="more-insights"><span
class="smaller strong red allcaps">More Insights</span></div><div
class="divsplitter" style="height: 0.25em;"></div><div class="more-insights-
item"><span class="small strong darkgray">Webcasts</span><div
class="divsplitter" style="height: 0.25em;"></div><div xmlns:a10="
http://www.w3.org/2005/Atom">


sample article that has multiple pages:
http://www.informationweek.com/mobil...ek_sitedefault

what changes to I need to make to my recipe to get this to work right?
Camper65 is offline   Reply With Quote