![]() |
#16 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
|
I am not sure, actually.
With Leopard, the Instapaper recipe seems to work better: it syncs more articles than 10, it also syncs the starred ones. However, it does not sync all anyway. On the other hand, with Snow Leopard it only syncs 10 articles, no more, no less. Does the recipe have memory of the articles it synced last time? If an article remains in instapaper unread section, will it be downloaded every day? |
![]() |
![]() |
![]() |
#17 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
|
Fixed....I Think
To try and narrow this down, I created a custom news source and then piece by piece cut and pasted sections of the script back in (I'm sure there has to be an easier way to debug).
Anyway long story short, by omitting the last section, it works perfectly for me now grabbing both unread and starred articles: def print_version(self, url): return self.INDEX + '/text?u=' + urllib.quote(url) Here is the full script. All credit to the original creator of the script as this is essentially a cut and past of his work. Code:
import urllib from calibre import strftime from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('div', attrs={'class':'titleRow'}): description = self.tag_to_string(item.div) atag = item.a if atag and atag.has_key('href'): url = atag['href'] title = self.tag_to_string(atag) date = strftime(self.timefmt) articles.append({ 'title' :title ,'date' :date ,'url' :url ,'description':description }) totalfeeds.append((feedtitle, articles)) return totalfeeds Moderator Notice
Code tags added for readability. Last edited by Starson17; 03-11-2011 at 10:02 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#18 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
|
Kilgore3K, you might have actually solved it! I've tested it once, and it works. Let's see in the next days if it keeps working.
![]() |
![]() |
![]() |
![]() |
#19 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle
|
You sir are a gentleman and a scholar. Thank you so much.
|
![]() |
![]() |
![]() |
#20 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
|
Cool, works for me too! THX
|
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
|
![]()
Glad to be of help, now if I can just find the time to read all the articles I keep saving
|
![]() |
![]() |
![]() |
#22 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
great recipe! something that was long needed!
many, many thanks! |
![]() |
![]() |
![]() |
#23 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Mar 2011
Device: none
|
Download works great so far, but is there a way to fetch the text-only-version instead of the saved page in total?
|
![]() |
![]() |
![]() |
#24 | |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Quote:
![]() I'm not sure, but i think it's done through get_article_url function. links to the processed text are pretty straight-forward: instapaper.com/go/article_id/go and you can see those links in the html code of the page, no script is used. So I think it shouldn't be very difficult to amend the recipe's code. |
|
![]() |
![]() |
![]() |
#25 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You intentionally removed a piece of code that handled text versions of the articles and now complain that it does not work?
|
![]() |
![]() |
![]() |
#26 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
The real reason recipe stopped working is that structure of the site is changed. I'll see to that this week.
|
![]() |
![]() |
![]() |
#27 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
I personally didn't remove anything. I've only started using instapaper recently and that recipe was the only option available.
|
![]() |
![]() |
![]() |
#28 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Ok. I played around a bit and created the recipe that fetches all plain-text versions of the articles, streight out of instapaper. Here is the code:
Code:
import urllib from calibre import strftime from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('div', attrs={'class':'cornerControls'}): description = self.tag_to_string(item.div) atag = item.a if atag and atag.has_key('href'): url = atag['href'] articles.append({ 'url' :url }) totalfeeds.append((feedtitle, articles)) return totalfeeds def print_version(self, url): return 'http://www.instapaper.com' + url The problem is, that this particular tag contains no information about Title date or description. The latter two are not important for me personally, but first one is definitely the useful one. So if you use recipe like this you will get all items in TOC marked as Unknown Article. Even the link itself can't be reused as a Title, since instapaper has them all in numerical value. Maybe there is a possibility to fetch the title out of the article itself? ![]() Again, I possess next to nothing knowledge of python and pretty basic understanding of recipe API. I'm trying my best, but without direction from somebody more experienced it's just random wandering in the woods. |
![]() |
![]() |
![]() |
#29 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,190
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can use the populate_article_metadata method to fill in the title from the actual article contents.
|
![]() |
![]() |
![]() |
#30 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Ok. Here is a pretty much usable recipe. It creates newspapers right out of instapaper-processed texts. No omissions of articles should happen (unless the processing changes again).
It's pretty minimalistic: only title in the TOC, no date or article summary, since I do not use them. But I do encourage you to add this metadata or make it better in some other way. Code:
import urllib from calibre import strftime from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 no_stylesheets = True remove_javascript = True remove_tags = [ dict(name='div', attrs={'id':'text_controls_toggle'}) ,dict(name='script') ,dict(name='div', attrs={'id':'text_controls'}) ,dict(name='div', attrs={'id':'editing_controls'}) ] use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('div', attrs={'class':'cornerControls'}): description = self.tag_to_string(item.div) atag = item.a if atag and atag.has_key('href'): url = atag['href'] articles.append({ 'url' :url }) totalfeeds.append((feedtitle, articles)) return totalfeeds def print_version(self, url): return 'http://www.instapaper.com' + url def populate_article_metadata(self, article, soup, first): article.title = soup.find('h1').contents[0].strip() |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre + Instapaper Limits | feelsgoodman | Calibre | 3 | 11-27-2010 02:40 AM |
Syncing your Instapaper articles to your Kindle | Jeton | Amazon Kindle | 0 | 10-08-2010 03:28 AM |
Instapaper folders and Calibre | flyash | Calibre | 4 | 08-13-2010 02:01 AM |
Calibre, Instapaper, multipage articles and ordering | flyash | Calibre | 1 | 06-10-2010 07:03 PM |
Want best reader for downloading magazine articles, almost bought jetBook for $179 | brettmiller | Which one should I buy? | 7 | 01-10-2009 03:01 PM |