MobileRead Forums - View Single Post

Maxiboost · 11-26-2010, 08:44 AM

Hi!

Firstly can I say what a fabulous bit of software Calibre is! Well done to Kovid for what has made life so much easier..

I'm in the same boat as the OP when trying to download articles from Instapaper. It varies how many articles are actually downloaded. For instance, the first time I downloaded a bunch of articles the filesize was 4.3MB - a few minutes later when I tried again the size was 0.8MB and only a handful! I tried this a few days later and the same 0.8mb file was downloaded.

Do you think this is a Calibre issue or an Instapaper one? The recipe is as follows:

Code:

__license__   = 'GPL v3'
__copyright__ = '2009-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.instapaper.com
'''

import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class Instapaper(BasicNewsRecipe):
    title                 = 'The KindleMagic Daily'
    __author__            = 'Unknown'
    description           = '''Personalized news feeds. Go to instapaper.com to
                               setup up your news. Fill in your instapaper
                               username, and leave the password field
                               below blank.'''
    publisher             = 'Instapaper.com'
    category              = 'news, custom'
    oldest_article        = 50
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        }

    feeds = [
              (u'Uncategorised articles' , INDEX + u'/u')
             ,(u'Starred articles', INDEX + u'/starred')
	   ,(u'News', INDEX + u'/u/folder/number_removed/news')
	   ,(u'Sport', INDEX + u'/u/folder/number_removed/sport') 
	   ,(u'Technology', INDEX + u'/u/folder/number_removed/technology')
	   ,(u'Gaming', INDEX + u'/u/folder/number_removed/gaming')
	   ,(u'Comment', INDEX + u'/u/folder/number_removed/comment') 
	   ,(u'Gossip/Rubbish', INDEX + u'/u/folder/number_removed/gossip-rubbish')

]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0)
            br['username'] = self.username
            if self.password is not None:
               br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll('div', attrs={'class':'titleRow'}):
                description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
                    title       = self.tag_to_string(atag)
                    date        = strftime(self.timefmt)
                    articles.append({
                                      'title'      :title
                                     ,'date'       :date
                                     ,'url'        :url
                                     ,'description':description
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds

    def print_version(self, url):
        return self.INDEX + '/text?u=' + urllib.quote(url)

For info, I am running Calibre 0.7.29 on Linux Ubuntu 10.10.

Really hoping someone can help here folks!

Thanks in advance for your time

Cheers
Maxi (UK)

PS, not sure if this is relevant, but on my work's box I am running Ubuntu 10.04 - every time I download I get all the articles (i.e. 4.3mb) so it's working fine there. An Ubuntu 10.10 thing??

-----------