I've been trying to fix this recipe of mine for a while now, but still can't get it to pull multipage articles. At least now it's pulling first page and letting me go to the other pages.
My recipe
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed
class InformationWeek(BasicNewsRecipe):
title = u'InformationWeek'
oldest_article = 6
max_articles_per_feed = 150
auto_cleanup = True
ignore_duplicate_articles = {'title', 'url'}
remove_empty_feeds = True
remove_javascript = False
use_embedded_content = True
recursions = 1
match_regexps = [r'page_number=[0-9]+']
feeds = [
(u'InformationWeek - Stories', u'www.informationweek.com/rss_feeds.asp'),
(u'InformationWeek - Software', u'http://www.informationweek.com/rss_simple.asp?f_n=476&f_ln=Software'),
]
def parse_feeds (self):
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
for article in feed.articles[:]:
print 'article.title is: ', article.title
if 'healthcare' in article.title or 'healthcare' in article.url:
feed.articles.remove(article)
return feeds
the area of an article that shows (in bold) the language for the next page
Spoiler:
<div class="divsplitter" style="height: 1.25em;"></div><div style="height:
1.666em;"><div style="float: right;"><span class="smaller blue"><img src="
http://img.deusm.com/informationweek/slideshow-arrow-gray-left.png"
alt="Previous" style="width: 1.666em; height: 1.666em; border: 0; float: left;
margin-right: 0.666em;" />
<div style="float: left; height: 1.416666em; padding-
top: .25em;">1 of 3</div><a href="
http://www.informationweek.com/mobil...pping-guide-8-
tips/d/d-id/1112842?page_number=2" title="Next" ><img src="
http://img.deusm.com/informationweek/slideshow-arrow-black-right.png" alt="Next"
style="width: 1.666em; height: 1.666em; border: 0; float: right; margin-left:
0.666em;" /></a></span></div></div><div class="divsplitter" style="height:
.666em;"></div><div style="float: left; margin-right: 2px;"><span class="smaller
blue allcaps"><a href="#msgs">Comment</a> | </span></div><div
style="float: left; margin-right: 2px;"><span class="smaller blue allcaps"><a
href="email.asp"
onclick="window.open('/email.asp?url='+encodeURIComponent(thispage_sharel ink)
+'&title='+encodeURIComponent(document.title),'',' '); return false;">Email
This</a> | </span></div><div style="float: left; margin-right: 2px;">
<span class="smaller blue allcaps"><a href="/mobile/mobile-devices/tablet-
shopping-guide-8-tips/d/d-id/1112842?print=yes">Print</a> | </span>
</div><div style="float: left; margin-right: 2px;"><span class="smaller blue
allcaps"><a href="http://www.informationweek.com/rss_simple.asp">RSS</a></span>
</div><div class="divsplitter" style="height: .666em;"></div><div
class="divsplitter" style="height: 4px; background: #aaa;"></div><div
class="divsplitter" style="height: .666em;"></div><div id="more-insights"><span
class="smaller strong red allcaps">More Insights</span></div><div
class="divsplitter" style="height: 0.25em;"></div><div class="more-insights-
item"><span class="small strong darkgray">Webcasts</span><div
class="divsplitter" style="height: 0.25em;"></div><div xmlns:a10="
http://www.w3.org/2005/Atom">
sample article that has multiple pages:
http://www.informationweek.com/mobil...ek_sitedefault
what changes to I need to make to my recipe to get this to work right?