Google Reader Recipe hack - Download all unread insted of just starred - Page 6

jens32 · 05-25-2011, 12:31 PM

Quote:

Originally Posted by der_joh

I found this in the bugtracking System, does the recipe work currently?
http://oldbugs.calibre-ebook.com/ticket/7581

Marking downloaded Feeds as read would be more than nice ;-)

I have taken the script and played with it for quiet some time now. I couldn't figure out how to get it working.

I get a 401 error for the second last line. The article.id doesn't seem to fit and I couldn't figure out how to get the right id of the articles. article.id is in the form of {'original-id': u'http://www.blogurl.com/?p=38615', 'gr:original-id': u'http://www.blogurl.com/?p=38615'}. Does this work or is the problem somewhere else?

PHP Code:


			
    def build_request(self, url):

        req = mechanize.Request(url)

        req.add_header('Authorization', 'GoogleLogin auth=%s' % self.auth)

        return req



    def article_downloaded(self, request, result):

        br = self.browser

        req = self.build_request(self.mark_as_read_url)

        if not result[2]:

            # Mark article as downloaded.

            article = request.article

            fields = urllib.urlencode([("i", article.id), ("a", "user/-/state/com.google/read"),

                ("ac", "edit-tags"), ("T", self.token)])

            br.open(self.mark_as_read_url, fields)

        return BasicNewsRecipe.article_downloaded(self, request, result)

loureiro · 05-26-2011, 08:24 PM

I was getting the 401 error and I think the possible cause is the fact that the "get_browser" function is running for each request and everytime authenticating on Google. Too many authentications = 401. I got the recipe working with the function get_browser "always returning the same browser".

Something like this:

Code:

   

       mybr = None

    def get_browser(self):
	
        br = BasicNewsRecipe.get_browser(self)
        
        if self.mybr is not None:
			return self.mybr
        
        if self.username is not None and self.password is not None:
			
            request = urllib.urlencode([('Email', self.username), ('Passwd', self.password),
                                        ('service', 'reader'), ('accountType', 'HOSTED_OR_GOOGLE'), ('source', __appname__)])
            response = br.open('https://www.google.com/accounts/ClientLogin', request)
            auth = re.search('Auth=(\S*)', response.read()).group(1)
            cookies = mechanize.CookieJar()
            br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
            br.addheaders = [('Authorization', 'GoogleLogin auth='+auth)]
            self.mybr = br
            
        return br

jens32 · 05-27-2011, 09:19 AM

@loureiro Problem is the same with the adaption in the get_browser method.
Now I am always getting an Error Code 400, no matter what version of the get_browser method I am using:

PHP Code:


			
  File "/tmp/calibre_0.8.2_tmp_EKxFgD/calibre_0.8.2_APXqCL_recipes/recipe0.py", line 57, in article_downloaded

    br.open(self.mark_as_read_url, fields)

  File "site-packages/mechanize/_opener.py", line 204, in open

  File "site-packages/mechanize/_urllib2_fork.py", line 457, in http_response

  File "site-packages/mechanize/_opener.py", line 227, in error

  File "site-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain

  File "site-packages/mechanize/_urllib2_fork.py", line 477, in http_error_default

HTTPError: HTTP Error 400: Bad Request

Recipes run fine if there is no custom article_downloaded method.

Shouldn't have the id argument have a format like tag:google.com,2005:reader/item/041fa4grw5de72c9 when marking the article read? I can't find any id like this.

loureiro · 05-27-2011, 10:33 AM

Quote:

Originally Posted by jens32

@loureiro Problem is the same with the adaption in the get_browser method.
Now I am always getting an Error Code 400, no matter what version of the get_browser method I am using:

PHP Code:


			
  File "/tmp/calibre_0.8.2_tmp_EKxFgD/calibre_0.8.2_APXqCL_recipes/recipe0.py", line 57, in article_downloaded

    br.open(self.mark_as_read_url, fields)

  File "site-packages/mechanize/_opener.py", line 204, in open

  File "site-packages/mechanize/_urllib2_fork.py", line 457, in http_response

  File "site-packages/mechanize/_opener.py", line 227, in error

  File "site-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain

  File "site-packages/mechanize/_urllib2_fork.py", line 477, in http_error_default

HTTPError: HTTP Error 400: Bad Request

Recipes run fine if there is no custom article_downloaded method.

Shouldn't have the id argument have a format like tag:google.com,2005:reader/item/041fa4grw5de72c9 when marking the article read? I can't find any id like this.

using "my version" of get_browser try changing this:

Code:

 
def article_downloaded(self, request, result): 
        br = self.browser 
        req = self.build_request(self.mark_as_read_url) 
        if not result[2]: 
            # Mark article as downloaded. 
            article = request.article 
            fields = urllib.urlencode([("i", article.id), ("a", "user/-/state/com.google/read"), 
                ("ac", "edit-tags"), ("T", self.token)]) 
            br.open(self.mark_as_read_url, fields) 
        return BasicNewsRecipe.article_downloaded(self, request, result)

to this:

Code:

 
def article_downloaded(self, request, result): 
        br = mybr # <----
        req = self.build_request(self.mark_as_read_url) 
        if not result[2]: 
            # Mark article as downloaded. 
            article = request.article 
            fields = urllib.urlencode([("i", article.id), ("a", "user/-/state/com.google/read"), 
                ("ac", "edit-tags"), ("T", self.token)]) 
            br.open(self.mark_as_read_url, fields) 
        return BasicNewsRecipe.article_downloaded(self, request, result)

jens32 · 05-27-2011, 10:49 AM

Quote:

Originally Posted by loureiro

using "my version" of get_browser try changing this:

Code:

 
        br = mybr # <----

I tried it already with self.mybr but with the same bad request result.

vangop · 06-02-2011, 10:57 AM

Both greader builtins didnt work for me, the articles were not downloaded, but just the summary.
I tried to load the feed directly by creating a new recipe but it didn't work at all - the result was empty. I'm not sure how to debug/log in the recipes so couldn't figure out what's wrong to my great frustration.

crivicris · 06-07-2011, 01:43 PM

Hi

I've just found your recipe and I'm going to give it a try.

But have you thought about mergeing with this:

Quote:

Originally Posted by Pahan

Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method.
Some caveats:

I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles.
The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with.
The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.)
If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding.
I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling.

Spoiler:

kobe2 · 06-17-2011, 04:39 PM

Hi guys,

I am trying to adapt the recipe to get mobi from 'starred notes' but couple days of fighting and still no luck :/ Can you help me? my adapted recipe looks like this (take a look at last line):

Code:

import urllib, re, mechanize
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre import __appname__

class GoogleReaderUber(BasicNewsRecipe):
    title   = 'Google Reader - Read It Later'
    description = '...'
    needs_subscription = True
    __author__ = 'davec, rollercoaster, Starson17'
    oldest_article = 365
    max_articles_per_feed = 100
    use_embedded_content = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            request = urllib.urlencode([('Email', self.username), ('Passwd', self.password),
                                        ('service', 'reader'), ('accountType', 'HOSTED_OR_GOOGLE'), ('source', __appname__)])
            response = br.open('https://www.google.com/accounts/ClientLogin', request)
            auth = re.search('Auth=(\S*)', response.read()).group(1)
            cookies = mechanize.CookieJar()
            br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
            br.addheaders = [('Authorization', 'GoogleLogin auth='+auth)]
        return br

    feeds = [(u'ReadItLater', u'http://www.google.com/reader/atom/user/-/state/com.google/starred?n=100')]

The problem is not with starred entries but with starred notes - Calibre takes only first starred note and something like forget the other.

My idea is to have all interesting articles in one mobi book. Google reader provides great feature named 'Note in Reader »' which enable quick store found text in reader.

thanks for your help

05-27-2011, 09:19 AM	#78
jens32 Junior Member Posts: 5 Karma: 10 Join Date: May 2011 Device: Kindle 3	@loureiro Problem is the same with the adaption in the get_browser method. Now I am always getting an Error Code 400, no matter what version of the get_browser method I am using: PHP Code: File "/tmp/calibre_0.8.2_tmp_EKxFgD/calibre_0.8.2_APXqCL_recipes/recipe0.py", line 57, in article_downloaded br.open(self.mark_as_read_url, fields) File "site-packages/mechanize/_opener.py", line 204, in open File "site-packages/mechanize/_urllib2_fork.py", line 457, in http_response File "site-packages/mechanize/_opener.py", line 227, in error File "site-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain File "site-packages/mechanize/_urllib2_fork.py", line 477, in http_error_default HTTPError: HTTP Error 400: Bad Request Recipes run fine if there is no custom article_downloaded method. Shouldn't have the id argument have a format like tag:google.com,2005:reader/item/041fa4grw5de72c9 when marking the article read? I can't find any id like this.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Google Reader recipe not working :(	techie_007	Calibre	1	01-26-2010 09:58 PM
Tagging and deleting for Google Reader Recipe	jomaweb	Calibre	14	01-26-2010 11:31 AM
Recipe Google Reader vs Google Reader Uber	DoctorOhh	Calibre	0	01-26-2010 04:37 AM
Google Uber Recipe takes so much time	jomaweb	Calibre	1	01-26-2010 03:21 AM
Read O'Reilly Hacks Series books using Google hack	Brian	Lounge	12	02-19-2009 03:17 PM

06-02-2011, 10:57 AM	#81
vangop Member Posts: 15 Karma: 910 Join Date: Jun 2011 Device: kindle	Both greader builtins didnt work for me, the articles were not downloaded, but just the summary. I tried to load the feed directly by creating a new recipe but it didn't work at all - the result was empty. I'm not sure how to debug/log in the recipes so couldn't figure out what's wrong to my great frustration.

Advert

Advert