Hi!
Firstly can I say what a fabulous bit of software Calibre is! Well done to Kovid for what has made life so much easier..
I'm in the same boat as the OP when trying to download articles from Instapaper. It varies how many articles are actually downloaded. For instance, the first time I downloaded a bunch of articles the filesize was 4.3MB - a few minutes later when I tried again the size was 0.8MB and only a handful! I tried this a few days later and the same 0.8mb file was downloaded.
Do you think this is a Calibre issue or an Instapaper one? The recipe is as follows:
Code:
__license__ = 'GPL v3'
__copyright__ = '2009-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.instapaper.com
'''
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe
class Instapaper(BasicNewsRecipe):
title = 'The KindleMagic Daily'
__author__ = 'Unknown'
description = '''Personalized news feeds. Go to instapaper.com to
setup up your news. Fill in your instapaper
username, and leave the password field
below blank.'''
publisher = 'Instapaper.com'
category = 'news, custom'
oldest_article = 50
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
}
feeds = [
(u'Uncategorised articles' , INDEX + u'/u')
,(u'Starred articles', INDEX + u'/starred')
,(u'News', INDEX + u'/u/folder/number_removed/news')
,(u'Sport', INDEX + u'/u/folder/number_removed/sport')
,(u'Technology', INDEX + u'/u/folder/number_removed/technology')
,(u'Gaming', INDEX + u'/u/folder/number_removed/gaming')
,(u'Comment', INDEX + u'/u/folder/number_removed/comment')
,(u'Gossip/Rubbish', INDEX + u'/u/folder/number_removed/gossip-rubbish')
]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'titleRow'}):
description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
title = self.tag_to_string(atag)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
def print_version(self, url):
return self.INDEX + '/text?u=' + urllib.quote(url)
For info, I am running Calibre 0.7.29 on Linux Ubuntu 10.10.
Really hoping someone can help here folks!
Thanks in advance for your time
Cheers
Maxi (UK)
PS, not sure if this is relevant, but on my work's box I am running Ubuntu 10.04 - every time I download I get all the articles (i.e. 4.3mb) so it's working fine there. An Ubuntu 10.10 thing??
-----------