Instapaper recipe - broken by site redesign?

adfadfsasdfafafd · 05-11-2014, 01:20 PM

The instapaper website had a redesign late last week. Since then, the recipe hasn't worked for me - it appears to only be downloading the starred items (in my case, none, so I get an empty file), rather than the whole list.

DavidFT · 05-14-2014, 05:17 AM

Same problem here! Is anyone able to fix it? (Unfortunately, I am not!)

adfadfsasdfafafd · 05-16-2014, 05:44 AM

OK, I just spent an hour looking into this (from scratch - I'm not a programmer...), and I think I have a fix: just replace (in the stable recipe)

Quote:

for item in soup.findAll('div', attrs={'class':'cornerControls'}):

with

Quote:

for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}):

DavidFT · 05-17-2014, 05:34 AM

Hi adfadfsasdfafafd,

thanks so much for your efforts! I replaced 'cornerControls' with 'title_row' already quite some time ago whan the script had stopped working.
That made it function again until last week.

Now I tried the variant you recommended: 'js_title_row title_row', and indeed, the articles are downloaded. That's a big improvement! However, there the articles are now predeeded by a lenthty list: Instapaper, MOVE, Home, Lyon; Tisa, Helvetica; Georgia, Share, Email Facebook etc., each in a single line.
Also, some markup is not processed, for instance one of the titles reads: "The <i>New York Times</i> on the Precipice."

Do you have these issues as well?

adfadfsasdfafafd · 05-18-2014, 04:02 AM

I independently noticed some of these issues and dealt with them just now (getting rid of the lengthy list at the beginning, and also the Evernote etc links at the end). I also added some improvements from this post:

https://www.mobileread.com/forums/sho...7&postcount=69

I've probably wasted enough time on this now, but I hope it's helpful. I haven't noticed the issue with markup in any of my article titles, so I am not going to worry about that for now! The full script is below.

# Calibre recipe for Instapaper.com (Stable version)
#
# Homepage: http://khromov.wordpress.com/project...alibre-recipe/
# Code Repository: https://bitbucket.org/khromov/calibre-instapaper

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'Instapaper'
__author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
oldest_article = 0
no_stylesheets = False
extra_css = 'q { font-style: italic; } .size3mode { color: black; }'
remove_javascript = True
remove_tags = [
dict(name='div', attrs={'id':'text_controls_toggle'})
,dict(name='script')
,dict(name='div', attrs={'id':'text_controls'})
,dict(name='section', attrs={'class':'primary_bar'})
,dict(name='div', attrs={'class':'modal_group'})
,dict(name='div', attrs={'id':'editing_controls'})
,dict(name='div', attrs={'class':'modal_name'})
,dict(name='div', attrs={'class':'highlight_popover'})
,dict(name='div', attrs={'class':'bar bottom'})
,dict(name='div', attrs={'id':'controlbar_container'})
,dict(name='div', attrs={'id':'footer'})
,dict(name='label')
]
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'

feeds = [
(u'Instapaper Unread', u'http://www.instapaper.com/u')
]

#Adds the title tag to the body of the recipe. Use this if your articles miss headings.
add_title_tag = False;

def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br

def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}):
#description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
articles.append({
'url' :url
})
totalfeeds.append((feedtitle, articles))
return totalfeeds

def print_version(self, url):
return 'http://www.instapaper.com' + url

def populate_article_metadata(self, article, soup, first):
article.title = soup.find('title').contents[0].strip()

def postprocess_html(self, soup, first_fetch):
#adds the title to each story, as it is not always included
if self.add_title_tag:
for link_tag in soup.findAll(attrs={"id" : "story"}):
link_tag.insert(0,'<h1>'+soup.find('title').conten ts[0].strip()+'</h1>')

#print repr(soup)
return soup

adfadfsasdfafafd · 05-19-2014, 05:35 AM

Looks like this is now in the official version:

https://github.com/kovidgoyal/calibr...30a07d3e1e42df

DavidFT · 05-20-2014, 02:40 PM

Thanks very much for the corrected recipe, it works perfectly! The problem with the markups seems to have been unrelated to the recipe and is gone as well!

Cheers,

David

charlesnadeau · 05-29-2014, 03:38 AM

I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine:

Code:

/usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword

Is there something else I should change inside the recipe?
Thanks!

Charles

DavidFT · 05-30-2014, 04:59 AM

Quote:

Originally Posted by charlesnadeau

I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine:

Code:

/usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword

Is there something else I should change inside the recipe?
Thanks!

Charles

Hi Charles,

is the updated recipe identical to the one adfadfsasdfafafd posted above in this thread? Otherwise you might try if the latter works. It does for me!

David

cendalc · 05-30-2014, 06:18 PM

adfadfsasdfafafd version does not work for me so here is my version:

Code:

# Calibre recipe for Instapaper.com (Stable version)
#
# Homepage: http://khromov.wordpress.com/projects/instapaper-calibre-recipe/
# Code Repository: https://bitbucket.org/khromov/calibre-instapaper

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
	title = u'Instapaper'
	__author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay'
	publisher = 'Instapaper.com'
	category = 'info, custom, Instapaper'
	oldest_article = 365
	max_articles_per_feed = 100
	reverse_article_order = True
	no_stylesheets = False
	extra_css = 'q { font-style: italic; } .size3mode { color: black; }'
	remove_javascript = True
	remove_tags = [
		dict(name='div', attrs={'id':'text_controls_toggle'}),
		dict(name='script'),
		dict(name='div', attrs={'id':'text_controls'}),
		dict(name='section', attrs={'class':'primary_bar'}),
		dict(name='div', attrs={'class':'modal_group'}),
		dict(name='div', attrs={'id':'editing_controls'}),
		dict(name='div', attrs={'class':'modal_name'}),
		dict(name='div', attrs={'class':'highlight_popover'}),
		dict(name='div', attrs={'class':'bar bottom'}),
		dict(name='div', attrs={'id':'controlbar_container'}),
		dict(name='div', attrs={'id':'footer'}),
		dict(name='label')
	]
	use_embedded_content = False
	needs_subscription = True
	INDEX = u'http://www.instapaper.com'
	LOGIN = INDEX + u'/user/login'

	feeds = [
		(u'Instapaper Unread', u'https://www.instapaper.com/u'),
		(u'Instapaper Starred', u'http://www.instapaper.com/starred')
	]

	def get_browser(self):
		br = BasicNewsRecipe.get_browser(self)
		if self.username is not None:
			br.open(self.LOGIN)
			br.select_form(nr=0)
			br['username'] = self.username
			if self.password is not None:
				br['password'] = self.password
			br.submit()
		return br

	def parse_index(self):
		totalfeeds = []
		lfeeds = self.get_feeds()
		for feedobj in lfeeds:
			feedtitle, feedurl = feedobj
			self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl))
			articles = []
			soup = self.index_to_soup(feedurl)
			for item in soup.findAll('a', attrs={'class': 'article_title'}):
				articles.append({
					'url': item['href'],
					'title': item['title']
				})
			totalfeeds.append((feedtitle, articles))
		return totalfeeds

	def print_version(self, url):
		return 'http://www.instapaper.com' + url

charlesnadeau · 06-01-2014, 11:18 AM

Quote:

Originally Posted by cendalc

adfadfsasdfafafd version does not work for me so here is my version:
[CODE]

It works perfectly for me, thanks! adfadfsasdfafafd's version wasn't working for me either.

Charles

raidenlee · 06-02-2014, 09:31 AM

Quote:

Originally Posted by cendalc

adfadfsasdfafafd version does not work for me so here is my version:

Code:

Waiting on my Kobo Glo to arrive. Decided to prep ahead by finding ways to download my Instapaper saves to be read on the Kobo.

I went through many websites and forum posts to arrive here. Thanks for finding a way to solve the issue!

This worked well!

05-11-2014, 01:20 PM	#1
adfadfsasdfafafd Enthusiast Posts: 27 Karma: 76 Join Date: May 2014 Device: Kindle 3	Instapaper recipe - broken by site redesign? The instapaper website had a redesign late last week. Since then, the recipe hasn't worked for me - it appears to only be downloading the starred items (in my case, none, so I get an empty file), rather than the whole list.

05-29-2014, 03:38 AM	#8
charlesnadeau Junior Member Posts: 2 Karma: 10 Join Date: May 2014 Device: Kindle DX	Change in the feed location I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine: Code: /usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword Is there something else I should change inside the recipe? Thanks! Charles

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom Instapaper Recipe	haroldtreen	Recipes	9	05-27-2025 07:10 PM
Instapaper - Updated recipe	khromov	Recipes	78	01-23-2015 02:09 AM
New York Times site redesign	nelson1379	Recipes	21	02-13-2014 10:22 PM
The Independent : Updated recipe for 2011 site redesign	NotTaken	Recipes	22	12-14-2012 01:01 PM
FAZ.NET recipe fails due to website redesign	juco	Recipes	7	10-07-2011 12:53 PM

05-14-2014, 05:17 AM	#2
DavidFT Junior Member Posts: 4 Karma: 10 Join Date: May 2014 Device: Kindle 4 NT	Same problem here! Is anyone able to fix it? (Unfortunately, I am not!)

05-17-2014, 05:34 AM	#4
DavidFT Junior Member Posts: 4 Karma: 10 Join Date: May 2014 Device: Kindle 4 NT	Hi adfadfsasdfafafd, thanks so much for your efforts! I replaced 'cornerControls' with 'title_row' already quite some time ago whan the script had stopped working. That made it function again until last week. Now I tried the variant you recommended: 'js_title_row title_row', and indeed, the articles are downloaded. That's a big improvement! However, there the articles are now predeeded by a lenthty list: Instapaper, MOVE, Home, Lyon; Tisa, Helvetica; Georgia, Share, Email Facebook etc., each in a single line. Also, some markup is not processed, for instance one of the titles reads: "The <i>New York Times</i> on the Precipice." Do you have these issues as well?

05-18-2014, 04:02 AM	#5
adfadfsasdfafafd Enthusiast Posts: 27 Karma: 76 Join Date: May 2014 Device: Kindle 3	I independently noticed some of these issues and dealt with them just now (getting rid of the lengthy list at the beginning, and also the Evernote etc links at the end). I also added some improvements from this post: https://www.mobileread.com/forums/sho...7&postcount=69 I've probably wasted enough time on this now, but I hope it's helpful. I haven't noticed the issue with markup in any of my article titles, so I am not going to worry about that for now! The full script is below. # Calibre recipe for Instapaper.com (Stable version) # # Homepage: http://khromov.wordpress.com/project...alibre-recipe/ # Code Repository: https://bitbucket.org/khromov/calibre-instapaper from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 oldest_article = 0 no_stylesheets = False extra_css = 'q { font-style: italic; } .size3mode { color: black; }' remove_javascript = True remove_tags = [ dict(name='div', attrs={'id':'text_controls_toggle'}) ,dict(name='script') ,dict(name='div', attrs={'id':'text_controls'}) ,dict(name='section', attrs={'class':'primary_bar'}) ,dict(name='div', attrs={'class':'modal_group'}) ,dict(name='div', attrs={'id':'editing_controls'}) ,dict(name='div', attrs={'class':'modal_name'}) ,dict(name='div', attrs={'class':'highlight_popover'}) ,dict(name='div', attrs={'class':'bar bottom'}) ,dict(name='div', attrs={'id':'controlbar_container'}) ,dict(name='div', attrs={'id':'footer'}) ,dict(name='label') ] use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [ (u'Instapaper Unread', u'http://www.instapaper.com/u') ] #Adds the title tag to the body of the recipe. Use this if your articles miss headings. add_title_tag = False; def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}): #description = self.tag_to_string(item.div) atag = item.a if atag and atag.has_key('href'): url = atag['href'] articles.append({ 'url' :url }) totalfeeds.append((feedtitle, articles)) return totalfeeds def print_version(self, url): return 'http://www.instapaper.com' + url def populate_article_metadata(self, article, soup, first): article.title = soup.find('title').contents[0].strip() def postprocess_html(self, soup, first_fetch): #adds the title to each story, as it is not always included if self.add_title_tag: for link_tag in soup.findAll(attrs={"id" : "story"}): link_tag.insert(0,'<h1>'+soup.find('title').conten ts[0].strip()+'</h1>') #print repr(soup) return soup

05-19-2014, 05:35 AM	#6
adfadfsasdfafafd Enthusiast Posts: 27 Karma: 76 Join Date: May 2014 Device: Kindle 3	Looks like this is now in the official version: https://github.com/kovidgoyal/calibr...30a07d3e1e42df

05-20-2014, 02:40 PM	#7
DavidFT Junior Member Posts: 4 Karma: 10 Join Date: May 2014 Device: Kindle 4 NT	Thanks very much for the corrected recipe, it works perfectly! The problem with the markups seems to have been unrelated to the recipe and is gone as well! Cheers, David

Advert

Advert