Thanks Kovid!
This is where I am so far...
Code:
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'InstapaperAuto'
__author__ = 'Darko Miletic'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
auto_cleanup=True
reverse_article_order = True
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'
feeds = [
(u'Instapaper Unread - Pg. 6 ', u'http://www.instapaper.com/u/6'),
(u'Instapaper Unread - Pg. 5', u'http://www.instapaper.com/u/5'),
(u'Instapaper Unread - Pg. 4', u'http://www.instapaper.com/u/4'),
(u'Instapaper Unread - Pg. 3', u'http://www.instapaper.com/u/3'),
(u'Instapaper Unread - Pg. 2', u'http://www.instapaper.com/u/2'),
(u'Instapaper Unread - Pg. 1', u'http://www.instapaper.com/u/1'),
(u'Instapaper Starred', u'http://www.instapaper.com/starred')
]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
self.myFormKey = soup.find('input', attrs={'name': 'form_key'})['value']
for item in soup.findAll('div', attrs={'class':'cornerControls'}):
description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
articles.append({
'url' :url
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
def cleanup(self):
params = urllib.urlencode(dict(form_key=self.myFormKey, submit="Archive All"))
self.browser.open("http://www.instapaper.com/bulk-archive", params)
def print_version(self, url):
return 'http://www.instapaper.com' + url
def populate_article_metadata(self, article, soup, first):
article.title = soup.find('title').contents[0].strip()
def postprocess_html(self, soup, first_fetch):
for link_tag in soup.findAll(attrs={"id" : "story"}):
link_tag.insert(0,'<h1>'+soup.find('title').contents[0].strip()+'</h1>')
return soup
This is Darko's recipe that I modified.
Changes:
- I added feeds for 6 unread pages instead of 1. I only have 5 pages, but adding 6 leaves room in case I get more. When I open the file on my kindle, only 5 sections are displayed, so it omits empty ones. I like the 5 sections of 40 articles rather then 1 section of 200 articles.
- Added AutoClean=True. This decrease the size of the download from 4mb to 2.7mb. There's a lot less useless photos.
- Implemented the "Archive All" modification that cendalc/banjopicker created (
https://www.mobileread.com/forums/sho...8&postcount=13)
Update - Added "reverse_article_order = True" (Cred: Cendalc) and switched the order of the feeds so that older articles appear first. That way reading can be done in chronological order.
Comments:
I sort of patched this together with trial and error. The parts from def parse_index to the end still confuse me.
I believe the autoclean feature is cleaning already created text version that instapaper creates. Is that true?
If so, how would I go about making program open the links in the feed and then apply the autoclean directly to the webpages themselves? I find that instapapers text feature gives a few too many "Page not available's" and that readability is a bit better.
Lastly, the archive all feature is fine, but is their a way to archive pages as they are opened and pakaged? That way if someone wanted to download only a few articles, their entire collection wouldn't be archived.
Thanks for any feedback!
(This recipe stuff is cool!

)