Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-11-2014, 12:20 PM   #1
adfadfsasdfafafd
Enthusiast
adfadfsasdfafafd has learned how to buy an e-book online
 
Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
Instapaper recipe - broken by site redesign?

The instapaper website had a redesign late last week. Since then, the recipe hasn't worked for me - it appears to only be downloading the starred items (in my case, none, so I get an empty file), rather than the whole list.
adfadfsasdfafafd is offline   Reply With Quote
Old 05-14-2014, 04:17 AM   #2
DavidFT
Junior Member
DavidFT began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
Same problem here! Is anyone able to fix it? (Unfortunately, I am not!)
DavidFT is offline   Reply With Quote
Advert
Old 05-16-2014, 04:44 AM   #3
adfadfsasdfafafd
Enthusiast
adfadfsasdfafafd has learned how to buy an e-book online
 
Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
OK, I just spent an hour looking into this (from scratch - I'm not a programmer...), and I think I have a fix: just replace (in the stable recipe)

Quote:
for item in soup.findAll('div', attrs={'class':'cornerControls'}):
with

Quote:
for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}):
adfadfsasdfafafd is offline   Reply With Quote
Old 05-17-2014, 04:34 AM   #4
DavidFT
Junior Member
DavidFT began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
Hi adfadfsasdfafafd,

thanks so much for your efforts! I replaced 'cornerControls' with 'title_row' already quite some time ago whan the script had stopped working.
That made it function again until last week.

Now I tried the variant you recommended: 'js_title_row title_row', and indeed, the articles are downloaded. That's a big improvement! However, there the articles are now predeeded by a lenthty list: Instapaper, MOVE, Home, Lyon; Tisa, Helvetica; Georgia, Share, Email Facebook etc., each in a single line.
Also, some markup is not processed, for instance one of the titles reads: "The <i>New York Times</i> on the Precipice."

Do you have these issues as well?
DavidFT is offline   Reply With Quote
Old 05-18-2014, 03:02 AM   #5
adfadfsasdfafafd
Enthusiast
adfadfsasdfafafd has learned how to buy an e-book online
 
Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
I independently noticed some of these issues and dealt with them just now (getting rid of the lengthy list at the beginning, and also the Evernote etc links at the end). I also added some improvements from this post:

https://www.mobileread.com/forums/sho...7&postcount=69

I've probably wasted enough time on this now, but I hope it's helpful. I haven't noticed the issue with markup in any of my article titles, so I am not going to worry about that for now! The full script is below.

# Calibre recipe for Instapaper.com (Stable version)
#
# Homepage: http://khromov.wordpress.com/project...alibre-recipe/
# Code Repository: https://bitbucket.org/khromov/calibre-instapaper

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
title = u'Instapaper'
__author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay'
publisher = 'Instapaper.com'
category = 'info, custom, Instapaper'
oldest_article = 365
max_articles_per_feed = 100
oldest_article = 0
no_stylesheets = False
extra_css = 'q { font-style: italic; } .size3mode { color: black; }'
remove_javascript = True
remove_tags = [
dict(name='div', attrs={'id':'text_controls_toggle'})
,dict(name='script')
,dict(name='div', attrs={'id':'text_controls'})
,dict(name='section', attrs={'class':'primary_bar'})
,dict(name='div', attrs={'class':'modal_group'})
,dict(name='div', attrs={'id':'editing_controls'})
,dict(name='div', attrs={'class':'modal_name'})
,dict(name='div', attrs={'class':'highlight_popover'})
,dict(name='div', attrs={'class':'bar bottom'})
,dict(name='div', attrs={'id':'controlbar_container'})
,dict(name='div', attrs={'id':'footer'})
,dict(name='label')
]
use_embedded_content = False
needs_subscription = True
INDEX = u'http://www.instapaper.com'
LOGIN = INDEX + u'/user/login'

feeds = [
(u'Instapaper Unread', u'http://www.instapaper.com/u')
]

#Adds the title tag to the body of the recipe. Use this if your articles miss headings.
add_title_tag = False;

def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
if self.username is not None:
br.open(self.LOGIN)
br.select_form(nr=0)
br['username'] = self.username
if self.password is not None:
br['password'] = self.password
br.submit()
return br

def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl)
for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}):
#description = self.tag_to_string(item.div)
atag = item.a
if atag and atag.has_key('href'):
url = atag['href']
articles.append({
'url' :url
})
totalfeeds.append((feedtitle, articles))
return totalfeeds

def print_version(self, url):
return 'http://www.instapaper.com' + url

def populate_article_metadata(self, article, soup, first):
article.title = soup.find('title').contents[0].strip()

def postprocess_html(self, soup, first_fetch):
#adds the title to each story, as it is not always included
if self.add_title_tag:
for link_tag in soup.findAll(attrs={"id" : "story"}):
link_tag.insert(0,'<h1>'+soup.find('title').conten ts[0].strip()+'</h1>')

#print repr(soup)
return soup
adfadfsasdfafafd is offline   Reply With Quote
Advert
Old 05-19-2014, 04:35 AM   #6
adfadfsasdfafafd
Enthusiast
adfadfsasdfafafd has learned how to buy an e-book online
 
Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
Looks like this is now in the official version:

https://github.com/kovidgoyal/calibr...30a07d3e1e42df
adfadfsasdfafafd is offline   Reply With Quote
Old 05-20-2014, 01:40 PM   #7
DavidFT
Junior Member
DavidFT began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
Thanks very much for the corrected recipe, it works perfectly! The problem with the markups seems to have been unrelated to the recipe and is gone as well!

Cheers,

David
DavidFT is offline   Reply With Quote
Old 05-29-2014, 02:38 AM   #8
charlesnadeau
Junior Member
charlesnadeau began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2014
Device: Kindle DX
Change in the feed location

I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine:
Code:
/usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword
Is there something else I should change inside the recipe?
Thanks!

Charles
charlesnadeau is offline   Reply With Quote
Old 05-30-2014, 03:59 AM   #9
DavidFT
Junior Member
DavidFT began at the beginning.
 
Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
Quote:
Originally Posted by charlesnadeau View Post
I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine:
Code:
/usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword
Is there something else I should change inside the recipe?
Thanks!

Charles
Hi Charles,

is the updated recipe identical to the one adfadfsasdfafafd posted above in this thread? Otherwise you might try if the latter works. It does for me!

David
DavidFT is offline   Reply With Quote
Old 05-30-2014, 05:18 PM   #10
cendalc
Junior Member
cendalc began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Nook
adfadfsasdfafafd version does not work for me so here is my version:
Code:
# Calibre recipe for Instapaper.com (Stable version)
#
# Homepage: http://khromov.wordpress.com/projects/instapaper-calibre-recipe/
# Code Repository: https://bitbucket.org/khromov/calibre-instapaper

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1299694372(BasicNewsRecipe):
	title = u'Instapaper'
	__author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay'
	publisher = 'Instapaper.com'
	category = 'info, custom, Instapaper'
	oldest_article = 365
	max_articles_per_feed = 100
	reverse_article_order = True
	no_stylesheets = False
	extra_css = 'q { font-style: italic; } .size3mode { color: black; }'
	remove_javascript = True
	remove_tags = [
		dict(name='div', attrs={'id':'text_controls_toggle'}),
		dict(name='script'),
		dict(name='div', attrs={'id':'text_controls'}),
		dict(name='section', attrs={'class':'primary_bar'}),
		dict(name='div', attrs={'class':'modal_group'}),
		dict(name='div', attrs={'id':'editing_controls'}),
		dict(name='div', attrs={'class':'modal_name'}),
		dict(name='div', attrs={'class':'highlight_popover'}),
		dict(name='div', attrs={'class':'bar bottom'}),
		dict(name='div', attrs={'id':'controlbar_container'}),
		dict(name='div', attrs={'id':'footer'}),
		dict(name='label')
	]
	use_embedded_content = False
	needs_subscription = True
	INDEX = u'http://www.instapaper.com'
	LOGIN = INDEX + u'/user/login'

	feeds = [
		(u'Instapaper Unread', u'https://www.instapaper.com/u'),
		(u'Instapaper Starred', u'http://www.instapaper.com/starred')
	]

	def get_browser(self):
		br = BasicNewsRecipe.get_browser(self)
		if self.username is not None:
			br.open(self.LOGIN)
			br.select_form(nr=0)
			br['username'] = self.username
			if self.password is not None:
				br['password'] = self.password
			br.submit()
		return br

	def parse_index(self):
		totalfeeds = []
		lfeeds = self.get_feeds()
		for feedobj in lfeeds:
			feedtitle, feedurl = feedobj
			self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl))
			articles = []
			soup = self.index_to_soup(feedurl)
			for item in soup.findAll('a', attrs={'class': 'article_title'}):
				articles.append({
					'url': item['href'],
					'title': item['title']
				})
			totalfeeds.append((feedtitle, articles))
		return totalfeeds

	def print_version(self, url):
		return 'http://www.instapaper.com' + url
cendalc is offline   Reply With Quote
Old 06-01-2014, 10:18 AM   #11
charlesnadeau
Junior Member
charlesnadeau began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2014
Device: Kindle DX
Quote:
Originally Posted by cendalc View Post
adfadfsasdfafafd version does not work for me so here is my version:
[CODE]
It works perfectly for me, thanks! adfadfsasdfafafd's version wasn't working for me either.

Charles
charlesnadeau is offline   Reply With Quote
Old 06-02-2014, 08:31 AM   #12
raidenlee
Junior Member
raidenlee began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2014
Device: Kobo Glo
Thumbs up

Quote:
Originally Posted by cendalc View Post
adfadfsasdfafafd version does not work for me so here is my version:
Code:

Waiting on my Kobo Glo to arrive. Decided to prep ahead by finding ways to download my Instapaper saves to be read on the Kobo.

I went through many websites and forum posts to arrive here. Thanks for finding a way to solve the issue!

This worked well!
raidenlee is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Instapaper Recipe haroldtreen Recipes 9 05-27-2025 06:10 PM
Instapaper - Updated recipe khromov Recipes 78 01-23-2015 01:09 AM
New York Times site redesign nelson1379 Recipes 21 02-13-2014 09:22 PM
The Independent : Updated recipe for 2011 site redesign NotTaken Recipes 22 12-14-2012 12:01 PM
FAZ.NET recipe fails due to website redesign juco Recipes 7 10-07-2011 11:53 AM


All times are GMT -4. The time now is 09:39 AM.


MobileRead.com is a privately owned, operated and funded community.