Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2011, 12:31 PM   #76
jens32
Junior Member
jens32 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3
Question mark items read

Quote:
Originally Posted by der_joh View Post
I found this in the bugtracking System, does the recipe work currently?
http://oldbugs.calibre-ebook.com/ticket/7581

Marking downloaded Feeds as read would be more than nice ;-)
I have taken the script and played with it for quiet some time now. I couldn't figure out how to get it working.

I get a 401 error for the second last line. The article.id doesn't seem to fit and I couldn't figure out how to get the right id of the articles. article.id is in the form of {'original-id': u'http://www.blogurl.com/?p=38615', 'gr:original-id': u'http://www.blogurl.com/?p=38615'}. Does this work or is the problem somewhere else?

PHP Code:
    def build_request(selfurl):
        
req mechanize.Request(url)
        
req.add_header('Authorization''GoogleLogin auth=%s' self.auth)
        return 
req

    def article_downloaded
(selfrequestresult):
        
br self.browser
        req 
self.build_request(self.mark_as_read_url)
        if 
not result[2]:
            
# Mark article as downloaded.
            
article request.article
            fields 
urllib.urlencode([("i"article.id), ("a""user/-/state/com.google/read"),
                (
"ac""edit-tags"), ("T"self.token)])
            
br.open(self.mark_as_read_urlfields)
        return 
BasicNewsRecipe.article_downloaded(selfrequestresult
jens32 is offline   Reply With Quote
Old 05-26-2011, 08:24 PM   #77
loureiro
Junior Member
loureiro began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2011
Device: Nook
Possible solution to 401 / Optimization

I was getting the 401 error and I think the possible cause is the fact that the "get_browser" function is running for each request and everytime authenticating on Google. Too many authentications = 401. I got the recipe working with the function get_browser "always returning the same browser".

Something like this:


Code:
   

       mybr = None

    def get_browser(self):
	
        br = BasicNewsRecipe.get_browser(self)
        
        if self.mybr is not None:
			return self.mybr
        
        if self.username is not None and self.password is not None:
			
            request = urllib.urlencode([('Email', self.username), ('Passwd', self.password),
                                        ('service', 'reader'), ('accountType', 'HOSTED_OR_GOOGLE'), ('source', __appname__)])
            response = br.open('https://www.google.com/accounts/ClientLogin', request)
            auth = re.search('Auth=(\S*)', response.read()).group(1)
            cookies = mechanize.CookieJar()
            br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
            br.addheaders = [('Authorization', 'GoogleLogin auth='+auth)]
            self.mybr = br
            
        return br
loureiro is offline   Reply With Quote
Advert
Old 05-27-2011, 09:19 AM   #78
jens32
Junior Member
jens32 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3
@loureiro Problem is the same with the adaption in the get_browser method.
Now I am always getting an Error Code 400, no matter what version of the get_browser method I am using:
PHP Code:
  File "/tmp/calibre_0.8.2_tmp_EKxFgD/calibre_0.8.2_APXqCL_recipes/recipe0.py"line 57in article_downloaded
    br
.open(self.mark_as_read_urlfields)
  
File "site-packages/mechanize/_opener.py"line 204in open
  File 
"site-packages/mechanize/_urllib2_fork.py"line 457in http_response
  File 
"site-packages/mechanize/_opener.py"line 227in error
  File 
"site-packages/mechanize/_urllib2_fork.py"line 332in _call_chain
  File 
"site-packages/mechanize/_urllib2_fork.py"line 477in http_error_default
HTTPError
HTTP Error 400Bad Request 
Recipes run fine if there is no custom article_downloaded method.

Shouldn't have the id argument have a format like tag:google.com,2005:reader/item/041fa4grw5de72c9 when marking the article read? I can't find any id like this.
jens32 is offline   Reply With Quote
Old 05-27-2011, 10:33 AM   #79
loureiro
Junior Member
loureiro began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2011
Device: Nook
Quote:
Originally Posted by jens32 View Post
@loureiro Problem is the same with the adaption in the get_browser method.
Now I am always getting an Error Code 400, no matter what version of the get_browser method I am using:
PHP Code:
  File "/tmp/calibre_0.8.2_tmp_EKxFgD/calibre_0.8.2_APXqCL_recipes/recipe0.py"line 57in article_downloaded
    br
.open(self.mark_as_read_urlfields)
  
File "site-packages/mechanize/_opener.py"line 204in open
  File 
"site-packages/mechanize/_urllib2_fork.py"line 457in http_response
  File 
"site-packages/mechanize/_opener.py"line 227in error
  File 
"site-packages/mechanize/_urllib2_fork.py"line 332in _call_chain
  File 
"site-packages/mechanize/_urllib2_fork.py"line 477in http_error_default
HTTPError
HTTP Error 400Bad Request 
Recipes run fine if there is no custom article_downloaded method.

Shouldn't have the id argument have a format like tag:google.com,2005:reader/item/041fa4grw5de72c9 when marking the article read? I can't find any id like this.
using "my version" of get_browser try changing this:


Code:
 
def article_downloaded(self, request, result): 
        br = self.browser 
        req = self.build_request(self.mark_as_read_url) 
        if not result[2]: 
            # Mark article as downloaded. 
            article = request.article 
            fields = urllib.urlencode([("i", article.id), ("a", "user/-/state/com.google/read"), 
                ("ac", "edit-tags"), ("T", self.token)]) 
            br.open(self.mark_as_read_url, fields) 
        return BasicNewsRecipe.article_downloaded(self, request, result)
to this:


Code:
 
def article_downloaded(self, request, result): 
        br = mybr # <----
        req = self.build_request(self.mark_as_read_url) 
        if not result[2]: 
            # Mark article as downloaded. 
            article = request.article 
            fields = urllib.urlencode([("i", article.id), ("a", "user/-/state/com.google/read"), 
                ("ac", "edit-tags"), ("T", self.token)]) 
            br.open(self.mark_as_read_url, fields) 
        return BasicNewsRecipe.article_downloaded(self, request, result)
loureiro is offline   Reply With Quote
Old 05-27-2011, 10:49 AM   #80
jens32
Junior Member
jens32 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3
Quote:
Originally Posted by loureiro View Post
using "my version" of get_browser try changing this:

Code:
 
        br = mybr # <----
I tried it already with self.mybr but with the same bad request result.
jens32 is offline   Reply With Quote
Advert
Old 06-02-2011, 10:57 AM   #81
vangop
Member
vangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-booksvangop has learned how to read e-books
 
Posts: 15
Karma: 910
Join Date: Jun 2011
Device: kindle
Both greader builtins didnt work for me, the articles were not downloaded, but just the summary.
I tried to load the feed directly by creating a new recipe but it didn't work at all - the result was empty. I'm not sure how to debug/log in the recipes so couldn't figure out what's wrong to my great frustration.
vangop is offline   Reply With Quote
Old 06-07-2011, 01:43 PM   #82
crivicris
Junior Member
crivicris began at the beginning.
 
crivicris's Avatar
 
Posts: 2
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 wifi+3G
Hi

I've just found your recipe and I'm going to give it a try.

But have you thought about mergeing with this:


Quote:
Originally Posted by Pahan View Post
Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method.
Some caveats:
  • I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles.
  • The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with.
  • The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.)
  • If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding.
  • I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling.
Spoiler:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class OnlyLatestRecipe(BasicNewsRecipe):
    title          = u'Unknown News Source'
    oldest_article = 10000
    max_articles_per_feed = 10000
    feeds          = [ ]

    def parse_feeds(self):
        recipe_dir = os.path.join(config_dir,'recipes')
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)

        feeds = BasicNewsRecipe.parse_feeds(self)

        for feed in feeds:
            feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='')
            feed_fn = os.path.join(feed_dir,feed_hash)

            past_items = set()
            if os.path.exists(feed_fn):
               with file(feed_fn) as f:
                   for h in f:
                       past_items.add(h.strip())
                       
            cur_items = set()
            for article in feed.articles[:]:
                item_hash = md5()
                if article.content: item_hash.update(article.content.encode('utf-8'))
                if article.summary: item_hash.update(article.summary.encode('utf-8'))
                item_hash = item_hash.hexdigest()
                if article.url:
                    item_hash = article.url + ':' + item_hash
                cur_items.add(item_hash)
                if item_hash in past_items:
                    feed.articles.remove(article)
            with file(feed_fn,'w') as f:
                for h in cur_items:
                    f.write(h+'\n')

        remove = [f for f in feeds if len(f) == 0 and
                self.remove_empty_feeds]
        for f in remove:
            feeds.remove(f)

        return feeds
crivicris is offline   Reply With Quote
Old 06-17-2011, 04:39 PM   #83
kobe2
Junior Member
kobe2 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2011
Device: kindle
Cool starred notes

Hi guys,

I am trying to adapt the recipe to get mobi from 'starred notes' but couple days of fighting and still no luck :/ Can you help me? my adapted recipe looks like this (take a look at last line):
Code:
import urllib, re, mechanize
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre import __appname__

class GoogleReaderUber(BasicNewsRecipe):
    title   = 'Google Reader - Read It Later'
    description = '...'
    needs_subscription = True
    __author__ = 'davec, rollercoaster, Starson17'
    oldest_article = 365
    max_articles_per_feed = 100
    use_embedded_content = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            request = urllib.urlencode([('Email', self.username), ('Passwd', self.password),
                                        ('service', 'reader'), ('accountType', 'HOSTED_OR_GOOGLE'), ('source', __appname__)])
            response = br.open('https://www.google.com/accounts/ClientLogin', request)
            auth = re.search('Auth=(\S*)', response.read()).group(1)
            cookies = mechanize.CookieJar()
            br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
            br.addheaders = [('Authorization', 'GoogleLogin auth='+auth)]
        return br

    feeds = [(u'ReadItLater', u'http://www.google.com/reader/atom/user/-/state/com.google/starred?n=100')]
The problem is not with starred entries but with starred notes - Calibre takes only first starred note and something like forget the other.

My idea is to have all interesting articles in one mobi book. Google reader provides great feature named 'Note in Reader »' which enable quick store found text in reader.

thanks for your help
kobe2 is offline   Reply With Quote
Reply

Tags
google reader, google reader uber, hack, recpie

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Reader recipe not working :( techie_007 Calibre 1 01-26-2010 09:58 PM
Tagging and deleting for Google Reader Recipe jomaweb Calibre 14 01-26-2010 11:31 AM
Recipe Google Reader vs Google Reader Uber DoctorOhh Calibre 0 01-26-2010 04:37 AM
Google Uber Recipe takes so much time jomaweb Calibre 1 01-26-2010 03:21 AM
Read O'Reilly Hacks Series books using Google hack Brian Lounge 12 02-19-2009 03:17 PM


All times are GMT -4. The time now is 11:59 PM.


MobileRead.com is a privately owned, operated and funded community.