MobileRead Forums - View Single Post - Calibre + Instapaper not downloading all articles!

Kilgore3K · 03-09-2011, 02:49 PM

To try and narrow this down, I created a custom news source and then piece by piece cut and pasted sections of the script back in (I'm sure there has to be an easier way to debug).

Anyway long story short, by omitting the last section, it works perfectly for me now grabbing both unread and starred articles:
def print_version(self, url):
return self.INDEX + '/text?u=' + urllib.quote(url)

Here is the full script. All credit to the original creator of the script as this is essentially a cut and past of his work.

Code:

import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
    title          = u'Instapaper'
    __author__            = 'Darko Miletic'
    publisher             = 'Instapaper.com'
    category              = 'info, custom, Instapaper'
    oldest_article = 365
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'



    feeds          = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0)
            br['username'] = self.username
            if self.password is not None:
               br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll('div', attrs={'class':'titleRow'}):
                description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
                    title       = self.tag_to_string(atag)
                    date        = strftime(self.timefmt)
                    articles.append({
                                      'title'      :title
                                     ,'date'       :date
                                     ,'url'        :url
                                     ,'description':description
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds

Moderator Notice
Code tags added for readability.