![]() |
#1 |
Enthusiast
![]() Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
|
Instapaper recipe - broken by site redesign?
The instapaper website had a redesign late last week. Since then, the recipe hasn't worked for me - it appears to only be downloading the starred items (in my case, none, so I get an empty file), rather than the whole list.
|
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
|
Same problem here! Is anyone able to fix it? (Unfortunately, I am not!)
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | ||
Enthusiast
![]() Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
|
OK, I just spent an hour looking into this (from scratch - I'm not a programmer...), and I think I have a fix: just replace (in the stable recipe)
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
|
Hi adfadfsasdfafafd,
thanks so much for your efforts! I replaced 'cornerControls' with 'title_row' already quite some time ago whan the script had stopped working. That made it function again until last week. Now I tried the variant you recommended: 'js_title_row title_row', and indeed, the articles are downloaded. That's a big improvement! However, there the articles are now predeeded by a lenthty list: Instapaper, MOVE, Home, Lyon; Tisa, Helvetica; Georgia, Share, Email Facebook etc., each in a single line. Also, some markup is not processed, for instance one of the titles reads: "The <i>New York Times</i> on the Precipice." Do you have these issues as well? |
![]() |
![]() |
![]() |
#5 |
Enthusiast
![]() Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
|
I independently noticed some of these issues and dealt with them just now (getting rid of the lengthy list at the beginning, and also the Evernote etc links at the end). I also added some improvements from this post:
https://www.mobileread.com/forums/sho...7&postcount=69 I've probably wasted enough time on this now, but I hope it's helpful. I haven't noticed the issue with markup in any of my article titles, so I am not going to worry about that for now! The full script is below. # Calibre recipe for Instapaper.com (Stable version) # # Homepage: http://khromov.wordpress.com/project...alibre-recipe/ # Code Repository: https://bitbucket.org/khromov/calibre-instapaper from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 oldest_article = 0 no_stylesheets = False extra_css = 'q { font-style: italic; } .size3mode { color: black; }' remove_javascript = True remove_tags = [ dict(name='div', attrs={'id':'text_controls_toggle'}) ,dict(name='script') ,dict(name='div', attrs={'id':'text_controls'}) ,dict(name='section', attrs={'class':'primary_bar'}) ,dict(name='div', attrs={'class':'modal_group'}) ,dict(name='div', attrs={'id':'editing_controls'}) ,dict(name='div', attrs={'class':'modal_name'}) ,dict(name='div', attrs={'class':'highlight_popover'}) ,dict(name='div', attrs={'class':'bar bottom'}) ,dict(name='div', attrs={'id':'controlbar_container'}) ,dict(name='div', attrs={'id':'footer'}) ,dict(name='label') ] use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [ (u'Instapaper Unread', u'http://www.instapaper.com/u') ] #Adds the title tag to the body of the recipe. Use this if your articles miss headings. add_title_tag = False; def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('div', attrs={'class':'js_title_row title_row'}): #description = self.tag_to_string(item.div) atag = item.a if atag and atag.has_key('href'): url = atag['href'] articles.append({ 'url' :url }) totalfeeds.append((feedtitle, articles)) return totalfeeds def print_version(self, url): return 'http://www.instapaper.com' + url def populate_article_metadata(self, article, soup, first): article.title = soup.find('title').contents[0].strip() def postprocess_html(self, soup, first_fetch): #adds the title to each story, as it is not always included if self.add_title_tag: for link_tag in soup.findAll(attrs={"id" : "story"}): link_tag.insert(0,'<h1>'+soup.find('title').conten ts[0].strip()+'</h1>') #print repr(soup) return soup |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Enthusiast
![]() Posts: 27
Karma: 76
Join Date: May 2014
Device: Kindle 3
|
Looks like this is now in the official version:
https://github.com/kovidgoyal/calibr...30a07d3e1e42df |
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
|
Thanks very much for the corrected recipe, it works perfectly! The problem with the markups seems to have been unrelated to the recipe and is gone as well!
Cheers, David |
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: May 2014
Device: Kindle DX
|
Change in the feed location
I used the updated recipe but when I try to fetch the unread item, nothing gets downloaded. Here is the command line I use on my Ubuntu machine:
Code:
/usr/bin/ebook-convert /usr/share/calibre/recipes/instapaper140518.recipe ~/Documents/pourkindle/instapapercustom`date +"%Y%m%d"`0.mobi --output-profile kindle_dx --username myusername --password mypassword Thanks! Charles |
![]() |
![]() |
![]() |
#9 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: May 2014
Device: Kindle 4 NT
|
Quote:
is the updated recipe identical to the one adfadfsasdfafafd posted above in this thread? Otherwise you might try if the latter works. It does for me! David |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Nook
|
adfadfsasdfafafd version does not work for me so here is my version:
Code:
# Calibre recipe for Instapaper.com (Stable version) # # Homepage: http://khromov.wordpress.com/projects/instapaper-calibre-recipe/ # Code Repository: https://bitbucket.org/khromov/calibre-instapaper from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1299694372(BasicNewsRecipe): title = u'Instapaper' __author__ = 'Darko Miletic, Stanislav Khromov, Jim Ramsay' publisher = 'Instapaper.com' category = 'info, custom, Instapaper' oldest_article = 365 max_articles_per_feed = 100 reverse_article_order = True no_stylesheets = False extra_css = 'q { font-style: italic; } .size3mode { color: black; }' remove_javascript = True remove_tags = [ dict(name='div', attrs={'id':'text_controls_toggle'}), dict(name='script'), dict(name='div', attrs={'id':'text_controls'}), dict(name='section', attrs={'class':'primary_bar'}), dict(name='div', attrs={'class':'modal_group'}), dict(name='div', attrs={'id':'editing_controls'}), dict(name='div', attrs={'class':'modal_name'}), dict(name='div', attrs={'class':'highlight_popover'}), dict(name='div', attrs={'class':'bar bottom'}), dict(name='div', attrs={'id':'controlbar_container'}), dict(name='div', attrs={'id':'footer'}), dict(name='label') ] use_embedded_content = False needs_subscription = True INDEX = u'http://www.instapaper.com' LOGIN = INDEX + u'/user/login' feeds = [ (u'Instapaper Unread', u'https://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred') ] def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username if self.password is not None: br['password'] = self.password br.submit() return br def parse_index(self): totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl) for item in soup.findAll('a', attrs={'class': 'article_title'}): articles.append({ 'url': item['href'], 'title': item['title'] }) totalfeeds.append((feedtitle, articles)) return totalfeeds def print_version(self, url): return 'http://www.instapaper.com' + url |
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: May 2014
Device: Kindle DX
|
|
![]() |
![]() |
![]() |
#12 | |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jun 2014
Device: Kobo Glo
|
![]() Quote:
I went through many websites and forum posts to arrive here. Thanks for finding a way to solve the issue! ![]() This worked well! ![]() |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom Instapaper Recipe | haroldtreen | Recipes | 9 | 05-27-2025 06:10 PM |
Instapaper - Updated recipe | khromov | Recipes | 78 | 01-23-2015 01:09 AM |
New York Times site redesign | nelson1379 | Recipes | 21 | 02-13-2014 09:22 PM |
The Independent : Updated recipe for 2011 site redesign | NotTaken | Recipes | 22 | 12-14-2012 12:01 PM |
FAZ.NET recipe fails due to website redesign | juco | Recipes | 7 | 10-07-2011 11:53 AM |