|
|
#16 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
|
I am not sure, actually.
With Leopard, the Instapaper recipe seems to work better: it syncs more articles than 10, it also syncs the starred ones. However, it does not sync all anyway. On the other hand, with Snow Leopard it only syncs 10 articles, no more, no less. Does the recipe have memory of the articles it synced last time? If an article remains in instapaper unread section, will it be downloaded every day? |
|
|
|
|
|
#17 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
|
Fixed....I Think
To try and narrow this down, I created a custom news source and then piece by piece cut and pasted sections of the script back in (I'm sure there has to be an easier way to debug).
Anyway long story short, by omitting the last section, it works perfectly for me now grabbing both unread and starred articles: def print_version(self, url): return self.INDEX + '/text?u=' + urllib.quote(url) Here is the full script. All credit to the original creator of the script as this is essentially a cut and past of his work. Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'Instapaper'
__author__ = 'Darko Miletic'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'
feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'titleRow'}):
description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
title = self.tag_to_string(atag)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
Moderator Notice
Code tags added for readability. Last edited by Starson17; 03-11-2011 at 11:02 AM. |
|
|
|
| Advert | |
|
|
|
|
#18 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: sony prs-650
|
Kilgore3K, you might have actually solved it! I've tested it once, and it works. Let's see in the next days if it keeps working.
|
|
|
|
|
|
#19 |
|
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle
|
You sir are a gentleman and a scholar. Thank you so much.
|
|
|
|
|
|
#20 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
|
Cool, works for me too! THX
|
|
|
|
| Advert | |
|
|
|
|
#21 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Mar 2011
Device: Kindle 2
|
Glad to be of help, now if I can just find the time to read all the articles I keep saving
|
|
|
|
|
|
#22 |
|
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
great recipe! something that was long needed!
many, many thanks! |
|
|
|
|
|
#23 |
|
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Mar 2011
Device: none
|
Download works great so far, but is there a way to fetch the text-only-version instead of the saved page in total?
|
|
|
|
|
|
#24 | |
|
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Quote:
![]() I'm not sure, but i think it's done through get_article_url function. links to the processed text are pretty straight-forward: instapaper.com/go/article_id/go and you can see those links in the html code of the page, no script is used. So I think it shouldn't be very difficult to amend the recipe's code. |
|
|
|
|
|
|
#25 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You intentionally removed a piece of code that handled text versions of the articles and now complain that it does not work?
|
|
|
|
|
|
#26 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
The real reason recipe stopped working is that structure of the site is changed. I'll see to that this week.
|
|
|
|
|
|
#27 |
|
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
I personally didn't remove anything. I've only started using instapaper recently and that recipe was the only option available.
|
|
|
|
|
|
#28 |
|
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Ok. I played around a bit and created the recipe that fetches all plain-text versions of the articles, streight out of instapaper. Here is the code:
Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'Instapaper'
__author__ = 'Darko Miletic'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'
feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'cornerControls'}):
description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
articles.append({
'url' :url
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
def print_version(self, url):
return 'http://www.instapaper.com' + url
The problem is, that this particular tag contains no information about Title date or description. The latter two are not important for me personally, but first one is definitely the useful one. So if you use recipe like this you will get all items in TOC marked as Unknown Article. Even the link itself can't be reused as a Title, since instapaper has them all in numerical value. Maybe there is a possibility to fetch the title out of the article itself? ![]() Again, I possess next to nothing knowledge of python and pretty basic understanding of recipe API. I'm trying my best, but without direction from somebody more experienced it's just random wandering in the woods. |
|
|
|
|
|
#29 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,617
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can use the populate_article_metadata method to fill in the title from the actual article contents.
|
|
|
|
|
|
#30 |
|
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Ok. Here is a pretty much usable recipe. It creates newspapers right out of instapaper-processed texts. No omissions of articles should happen (unless the processing changes again).
It's pretty minimalistic: only title in the TOC, no date or article summary, since I do not use them. But I do encourage you to add this metadata or make it better in some other way. Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'Instapaper'
__author__ = 'Darko Miletic'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
remove_tags = [
dict(name='div', attrs={'id':'text_controls_toggle'})
,dict(name='script')
,dict(name='div', attrs={'id':'text_controls'})
,dict(name='div', attrs={'id':'editing_controls'})
]
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'
feeds = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'cornerControls'}):
description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
articles.append({
'url' :url
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
def print_version(self, url):
return 'http://www.instapaper.com' + url
def populate_article_metadata(self, article, soup, first):
article.title = soup.find('h1').contents[0].strip()
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Calibre + Instapaper Limits | feelsgoodman | Calibre | 3 | 11-27-2010 03:40 AM |
| Syncing your Instapaper articles to your Kindle | Jeton | Amazon Kindle | 0 | 10-08-2010 04:28 AM |
| Instapaper folders and Calibre | flyash | Calibre | 4 | 08-13-2010 03:01 AM |
| Calibre, Instapaper, multipage articles and ordering | flyash | Calibre | 1 | 06-10-2010 08:03 PM |
| Want best reader for downloading magazine articles, almost bought jetBook for $179 | brettmiller | Which one should I buy? | 7 | 01-10-2009 04:01 PM |