Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-10-2011, 05:46 PM   #16
matznet
Junior Member
matznet began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
I am not sure, actually.
With Leopard, the Instapaper recipe seems to work better: it syncs more articles than 10, it also syncs the starred ones. However, it does not sync all anyway.
On the other hand, with Snow Leopard it only syncs 10 articles, no more, no less.

Does the recipe have memory of the articles it synced last time? If an article remains in instapaper unread section, will it be downloaded every day?
matznet is offline   Reply With Quote
Old 03-09-2011, 01:49 PM   #17
Kilgore3K
Junior Member
Kilgore3K began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
Fixed....I Think

To try and narrow this down, I created a custom news source and then piece by piece cut and pasted sections of the script back in (I'm sure there has to be an easier way to debug).

Anyway long story short, by omitting the last section, it works perfectly for me now grabbing both unread and starred articles:
def print_version(self, url):
return self.INDEX + '/text?u=' + urllib.quote(url)

Here is the full script. All credit to the original creator of the script as this is essentially a cut and past of his work.
Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
    title          = u'Instapaper'
    __author__            = 'Darko Miletic'
    publisher             = 'Instapaper.com'
    category              = 'info, custom, Instapaper'
    oldest_article = 365
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'



    feeds          = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0)
            br['username'] = self.username
            if self.password is not None:
               br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll('div', attrs={'class':'titleRow'}):
                description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
                    title       = self.tag_to_string(atag)
                    date        = strftime(self.timefmt)
                    articles.append({
                                      'title'      :title
                                     ,'date'       :date
                                     ,'url'        :url
                                     ,'description':description
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds
Moderator Notice
Code tags added for readability.

Last edited by Starson17; 03-11-2011 at 10:02 AM.
Kilgore3K is offline   Reply With Quote
Advert
Old 03-11-2011, 03:25 AM   #18
matznet
Junior Member
matznet began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
Kilgore3K, you might have actually solved it! I've tested it once, and it works. Let's see in the next days if it keeps working.
matznet is offline   Reply With Quote
Old 03-11-2011, 10:55 PM   #19
zach382
Junior Member
zach382 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle
You sir are a gentleman and a scholar. Thank you so much.
zach382 is offline   Reply With Quote
Old 03-18-2011, 04:03 AM   #20
Gomez
Junior Member
Gomez began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
Cool, works for me too! THX
Gomez is offline   Reply With Quote
Advert
Old 03-18-2011, 01:39 PM   #21
Kilgore3K
Junior Member
Kilgore3K began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
Smile First times the charm

Glad to be of help, now if I can just find the time to read all the articles I keep saving
Kilgore3K is offline   Reply With Quote
Old 03-19-2011, 10:23 AM   #22
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
great recipe! something that was long needed!
many, many thanks!
Dereks is offline   Reply With Quote
Old 03-28-2011, 07:16 AM   #23
abracadabra
Junior Member
abracadabra began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2011
Device: none
Download works great so far, but is there a way to fetch the text-only-version instead of the saved page in total?
abracadabra is offline   Reply With Quote
Old 03-30-2011, 10:32 AM   #24
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Quote:
Originally Posted by abracadabra View Post
Download works great so far, but is there a way to fetch the text-only-version instead of the saved page in total?
+1. I didn't notice at first that it only fetches content of the source directly and assumed the recipe accesses processes text. This greatly diminishes the value of recipe

I'm not sure, but i think it's done through
get_article_url function.
links to the processed text are pretty straight-forward: instapaper.com/go/article_id/go
and you can see those links in the html code of the page, no script is used. So I think it shouldn't be very difficult to amend the recipe's code.
Dereks is offline   Reply With Quote
Old 03-31-2011, 07:16 AM   #25
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
You intentionally removed a piece of code that handled text versions of the articles and now complain that it does not work?
kiklop74 is offline   Reply With Quote
Old 03-31-2011, 07:19 AM   #26
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
The real reason recipe stopped working is that structure of the site is changed. I'll see to that this week.
kiklop74 is offline   Reply With Quote
Old 03-31-2011, 09:02 AM   #27
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
I personally didn't remove anything. I've only started using instapaper recently and that recipe was the only option available.
Dereks is offline   Reply With Quote
Old 04-01-2011, 03:30 PM   #28
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Ok. I played around a bit and created the recipe that fetches all plain-text versions of the articles, streight out of instapaper. Here is the code:

Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
    title          = u'Instapaper'
    __author__            = 'Darko Miletic'
    publisher             = 'Instapaper.com'
    category              = 'info, custom, Instapaper'
    oldest_article = 365
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'


    feeds          = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0)
            br['username'] = self.username
            if self.password is not None:
               br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll('div', attrs={'class':'cornerControls'}):
                description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
                    articles.append({
                                     'url'        :url
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds

    def print_version(self, url): 
        return 'http://www.instapaper.com' + url
The only thing that has been changes is basically the div tag, which wraps the link to the article.
The problem is, that this particular tag contains no information about Title date or description. The latter two are not important for me personally, but first one is definitely the useful one.
So if you use recipe like this you will get all items in TOC marked as Unknown Article. Even the link itself can't be reused as a Title, since instapaper has them all in numerical value.
Maybe there is a possibility to fetch the title out of the article itself?


Again, I possess next to nothing knowledge of python and pretty basic understanding of recipe API. I'm trying my best, but without direction from somebody more experienced it's just random wandering in the woods.
Dereks is offline   Reply With Quote
Old 04-01-2011, 03:32 PM   #29
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can use the populate_article_metadata method to fill in the title from the actual article contents.
kovidgoyal is offline   Reply With Quote
Old 04-01-2011, 06:14 PM   #30
Dereks
Connoisseur
Dereks began at the beginning.
 
Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
Ok. Here is a pretty much usable recipe. It creates newspapers right out of instapaper-processed texts. No omissions of articles should happen (unless the processing changes again).
It's pretty minimalistic: only title in the TOC, no date or article summary, since I do not use them. But I do encourage you to add this metadata or make it better in some other way.

Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
    title                             = u'Instapaper'
    __author__                  = 'Darko Miletic'
    publisher                     = 'Instapaper.com'
    category                      = 'info, custom, Instapaper'
    oldest_article               = 365
    max_articles_per_feed = 100
    no_stylesheets        = True
    remove_javascript     = True
    remove_tags              = [
	dict(name='div', attrs={'id':'text_controls_toggle'})
	,dict(name='script')
	,dict(name='div', attrs={'id':'text_controls'})
	,dict(name='div', attrs={'id':'editing_controls'})
	 ]
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'


    feeds          = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0)
            br['username'] = self.username
            if self.password is not None:
               br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        totalfeeds = []
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
            for item in soup.findAll('div', attrs={'class':'cornerControls'}):
                description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
                    articles.append({
                                     'url'        :url
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds

    def print_version(self, url): 
        return 'http://www.instapaper.com' + url

    def populate_article_metadata(self, article, soup, first):
        article.title  = soup.find('h1').contents[0].strip()
Dereks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre + Instapaper Limits feelsgoodman Calibre 3 11-27-2010 02:40 AM
Syncing your Instapaper articles to your Kindle Jeton Amazon Kindle 0 10-08-2010 03:28 AM
Instapaper folders and Calibre flyash Calibre 4 08-13-2010 02:01 AM
Calibre, Instapaper, multipage articles and ordering flyash Calibre 1 06-10-2010 07:03 PM
Want best reader for downloading magazine articles, almost bought jetBook for $179 brettmiller Which one should I buy? 7 01-10-2009 03:01 PM


All times are GMT -4. The time now is 01:25 PM.


MobileRead.com is a privately owned, operated and funded community.