Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 07-03-2010, 04:38 PM   #2236
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
New recipe of recipes: BigOven

Food Recipes (to go with my Epicurious recipe of recipes). Registration at the site is free. Registered users receive larger photos, so I decided to require a username/password in the recipe. If you don't want to register, just use a fake username password. It will work fine, but provide the smaller photos. Username is the email address you register under at the bigoven.com site.
Attached Files
File Type: zip BigOven.zip (1.1 KB, 296 views)
Starson17 is offline  
Old 07-03-2010, 05:12 PM   #2237
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by schnortz View Post
The recipe I am using is the following (modified with your suggested change, even if it was unsuccessful). Hope I'm not violating etiquette by posting the code.
It's not an etiquette violation to post it, that's what this thread is for. However, if you would post it again, but this time, put the code tags around it (use the hash # symbol) it will preserve the indents, and I'll test it for you.

If you really want to be nice to the thread, also use the spoiler tag (eye with an X) that will collapse it to take less space.

Quote:
And as requested... here is a link
I looked at the links. Your preprocess_regexps looks basically correct now, but you have some minor differences from the way I normally use it. Possibly that is the problem now, or maybe it's caused by the browser. I'm not sure how you viewed the source, but sometimes a browser changes things slightly so you don't get a match. The best way to do it if you have a problem is to print the soup in postprocess_html. I'll test that if you post your recipe with the indents in code tags.

As to the "Photo" issue - you want to skip articles that have that text in the link. I only know one way to do that. Perhaps someone else knows another. Basically, I know two ways to follow articles - to follow all the links in the automatically parsed feed, or to build your own feed (without the Photo links) with parse_index and then follow all of those links.

If there's another way - to follow some links, but not others, I don't know it. As I posted, I had hoped at one time that filter_regexps would do that job, but I never got it to work. I suspect that it only works on recursed links, not the main article link.

Do you want details on how to use parse_index? Either way, you should start here.
Starson17 is offline  
Old 07-03-2010, 05:34 PM   #2238
elsuave
Member
elsuave began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jun 2010
Device: Nook
Updated Recipe for O Estado de S. Paulo

Hey guys, I just noticed that the recipe for O Estado de S. Paulo is broken (Calibre 0.7.7). I have attempted to update it; because I have limited coding experience (and know no python at all), the code may not be efficient (if a more experienced recipe-maker would like to double-check, I'd appreciate it).

It appears to be getting the job done, though.

Code:
Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2010, elsuave'
'''
estadao.com.br
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Estadao(BasicNewsRecipe):
    title                 = 'O Estado de S. Paulo'
    __author__            = 'elsuave (modified from Darko Miletic)'
    description           = 'News from Brasil in Portuguese'
    publisher             = 'O Estado de S. Paulo'
    category              = 'news, politics, Brasil'
    oldest_article        = 2
    max_articles_per_feed = 25
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf8'
    cover_url             = 'http://www.estadao.com.br/img/logo_estadao.png'
    remove_javascript     = True

    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        ]

    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'

    keep_only_tags = [
                          dict(name='div', attrs={'class':['bb-md-noticia','c5']})
                     ]

    remove_tags = [
                     dict(name=['script','object','form','ul'])
                    ,dict(name='div', attrs={'class':['fnt2 Color_04 bold','right fnt2 innerTop15 dvTmFont','™_01 right outerLeft15','tituloBox','tags']})
                    ,dict(name='div', attrs={'id':['bb-md-noticia-subcom']})
                  ]

    feeds = [
               (u'Manchetes Estadao', u'http://www.estadao.com.br/rss/manchetes.xml')
              ,(u'Ultimas noticias', u'http://www.estadao.com.br/rss/ultimas.xml')
              ,(u'Nacional', u'http://www.estadao.com.br/rss/nacional.xml')
              ,(u'Internacional', u'http://www.estadao.com.br/rss/internacional.xml')
              ,(u'Cidades', u'http://www.estadao.com.br/rss/cidades.xml')
              ,(u'Esportes', u'http://www.estadao.com.br/rss/esportes.xml')
              ,(u'Arte & Lazer', u'http://www.estadao.com.br/rss/arteelazer.xml')
              ,(u'Economia', u'http://www.estadao.com.br/rss/economia.xml')
              ,(u'Vida &', u'http://www.estadao.com.br/rss/vidae.xml')
            ]



    language = 'pt'

    def get_article_url(self, article):
        url = BasicNewsRecipe.get_article_url(self, article)
        if '/Multimidia/' not in url:
            return url
Attached Files
File Type: zip estadao.py.zip (1.0 KB, 309 views)

Last edited by elsuave; 07-03-2010 at 05:37 PM.
elsuave is offline  
Old 07-03-2010, 06:43 PM   #2239
elsuave
Member
elsuave began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jun 2010
Device: Nook
Updated Recipe for Editor & Publisher

Another update for a recipe that's currently broken: Editor & Publisher (Calibre 0.7.7). Please modify if there are better ways to accomplish the task.

Code:
Spoiler:
Code:
import string, re

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2010 elsuave'

from calibre.web.feeds.news import BasicNewsRecipe
class EandP(BasicNewsRecipe):
    title              = u'Editor and Publisher'
    __author__         = u'elsuave (modified from Xanthan Gum)'
    description        = 'News about newspapers and journalism.'
    publisher             = 'Editor and Publisher'
    category              = 'news, journalism, industry'
    language = 'en'
    max_articles_per_feed = 25
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf8'
    cover_url             = 'http://www.editorandpublisher.com/images/EP_main_logo.gif'
    remove_javascript     = True

    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        ]

    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'

    # Font formatting code borrowed from kwetal

    extra_css = '''
                 body{font-family:verdana,arial,helvetica,geneva,sans-serif ;}
                 h1{font-size: xx-large;}
                 h2{font-size: large;}
                '''

    # Keep only div:itemmgap

    keep_only_tags = [
                          dict(name='div', attrs={'class':'itemmgap'})
                          ]

    # Remove commenting/social media lins

    remove_tags_after = [dict(name='div', attrs={'class':'clear'})]


    feeds = [(u'Breaking News', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx'),
             (u'Business News', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx?CategoryId=2'),
             (u'Ad/Circ News', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx?CategoryId=3'),
             (u'Newsroom', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx?CategoryId=4'),
             (u'Technology News', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx?CategoryId=5'),
             (u'Syndicates News', u'http://www.editorandpublisher.com/GenerateRssFeed.aspx?CategoryId=7')]
Attached Files
File Type: zip editorandpub.py.zip (991 Bytes, 329 views)
elsuave is offline  
Old 07-03-2010, 09:27 PM   #2240
sibermage
Junior Member
sibermage began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Sony PRS600
Quote:
Originally Posted by rty View Post
Here it is: Recipe for SINGTAO DAILY CANADA

Language: Chinese (Traditional)
Tested OK on B&N Nook e-reader.
Gave it a try and it's giving me this error:

ERROR: Invalid input: <p>Could not create recipe. Error:<br>invalid syntax (recipe3.py, line 4)
sibermage is offline  
Old 07-04-2010, 01:02 AM   #2241
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by sibermage View Post
Gave it a try and it's giving me this error:

ERROR: Invalid input: <p>Could not create recipe. Error:<br>invalid syntax (recipe3.py, line 4)
Are you sure you picked up a right recipe? The name of the recipe inside the zip file is "Singtao Daily.py" not "recipe3.py"
rty is offline  
Old 07-04-2010, 01:07 AM   #2242
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by rty View Post
Are you sure you picked up a right recipe? The name of the recipe inside the zip file is "Singtao Daily.py" not "recipe3.py"
He has the right file. When you try to load this recipe from your file you get:

Quote:
ERROR: Invalid input: <p>Could not create recipe. Error:<br>invalid syntax (recipe91.py, line 4)
The number after recipe increments with each attempt.
DoctorOhh is offline  
Old 07-04-2010, 01:48 AM   #2243
schnortz
Junior Member
schnortz began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
The Appleton Post Crescent Recipe - Take Two
Hope I did this right

Spoiler:
Code:
import string, re

#!/usr/bin/env python
__license__   = 'GPL v3'
__copyright__ = '2009 Kovid Goyal <kovid at kovidgoyal.net>'

from calibre.web.feeds.news import BasicNewsRecipe

class AppletonPostCrescent(BasicNewsRecipe):
    title          = u'Appleton Post Crescent'
    oldest_article = 2
    language = 'en'

    __author__     = 'Joseph Kitzmiller and Sujata Raman'
    max_articles_per_feed = 25
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
    cover_url  = u'http://www.postcrescent.com/ic/assets/frontpage.pdf'
    publisher              = 'Appleton Post Crescent, Gannett'
    category               = 'news, Appleton, Fox Cities, Wisconsin'

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-size:large; color:#0E5398; }
                    h2{color:#666666;}
                   .blog_title{color:#4E0000; font-family:Georgia,"Times New Roman",Times,serif; font-size:large;}
                   .sidebar-photo{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:30%;}
                   .blog_post{font-family:Arial,Helvetica,sans-serif; color:#222222; font-size:xx-small;}
                   .article-bodytext{font-family:Arial,Helvetica,sans-serif; font-size:xx-small; color:#222222;font-weight:normal;}
                   .ratingbyline{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:50%;}
                   .author{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
                   .date{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
                   .padding{font-family:Arial,Helvetica,sans-serif; font-size:70%; color:#222222; font-weight:normal;}
                    '''

    preprocess_regexps = [
                         (re.compile(r'<p></p><div.*</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
                         ]
				
    keep_only_tags = [dict(name='div', attrs={'class':['padding','sidebar-photo']})]

    remove_tags = [ dict(name=['object','link','table','embed','script', 'noscript'])
                    ,dict(name='div',attrs={'id':["pluckcomments","StoryChat"]})
                    ,dict(name='div',attrs={'class':['article-tools',"padding article-sidebar",'articleflex-container','poster-container','newslist','footer-container','sidebar-related','sub']})
                    ,dict(name='p',attrs={'class':['posted','tags']})]

    feeds	= [(u'Breaking News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSbreaking.pbs&mime=xml'),
		(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'),
		(u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'),
		(u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml'),
		(u'Buzz Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9a8980f0-f726-439c-8c4e-1dc0f788941e'),
		(u'Weekend Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9dbf4deb-0468-41b7-a0c7-3a777c03d64c')]
				

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup


As far as the API page you referenced, I did look that over. I, too, tried using a filter_regexps to no avail. I'll admit I haven't thoroughly studied that page thanks to a combination of confusion, frustration and tiredness. However, if you still wish to share your expertise in the parse_index... that would be wonderful.

Edit: FYI... I've been studying the pages' html code using Firebug in Firefox. If that helps.
schnortz is offline  
Old 07-04-2010, 02:48 AM   #2244
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by dwanthny View Post
He has the right file. When you try to load this recipe from your file you get:
Ok, I'll take a look at it tonight when I reach home.

The content of the recipe is below:

Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
class AdvancedUserRecipe1278063072(BasicNewsRecipe):
    title          = u'Singtao Daily - Canada'
    oldest_article = 7
    max_articles_per_feed = 100
    __author__            = 'rty'
    description           = 'Toronto Canada Chinese Newspaper'
    publisher             = 'news.singtao.ca'
    category              = 'Chinese, News, Canada'
    remove_javascript = True
    use_embedded_content   = False
    no_stylesheets = True
    language = 'cn-HK'
    conversion_options = {'linearize_tables':True} 
    masthead_url = 'http://news.singtao.ca/i/site_2009/logo.jpg'
    extra_css = '''
    	@font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
	body {text-align: justify; margin-right: 8pt; font-family: 'DroidFont', serif;}\n
                    h1 {font-family: 'DroidFont', serif;}\n
                    .articledescription {font-family: 'DroidFont', serif;}
            '''
    keep_only_tags = [
	dict(name='div', attrs={'id':['title','storybody']}),
	dict(name='div', attrs={'class':'content'})
	]

    def parse_index(self):
            feeds = []
            for title, url in [
            	('Editorial', 'http://news.singtao.ca/toronto/editorial.html'),
             ('Toronto 城市/社區', 'http://news.singtao.ca/toronto/city.html'),
             ('Canada 加國', 'http://news.singtao.ca/toronto/canada.html'),
          ('Entertainment', 'http://news.singtao.ca/toronto/entertainment.html'),
	('World', 'http://news.singtao.ca/toronto/world.html'),
	('Finance 國際財經', 'http://news.singtao.ca/toronto/finance.html'),
	('Sports', 'http://news.singtao.ca/toronto/sports.html'),
                            ]:
               articles = self.parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
        
    def parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class': ['newslist paddingL10T10','newslist3 paddingL10T10']})
            #date = div.find(attrs={'class': 'underlineBLK'})
            current_articles = []
            for li in div.findAll('li'):
                    a = li.find('a', href = True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://news.singtao.ca'+url
          #          self.log('\t\tFound article:', title)
          #          self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url': url, 'description':''})

            return current_articles

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
           del item['style']
        for item in soup.findAll(width=True):
           del item['width']
        return soup
rty is offline  
Old 07-04-2010, 05:17 AM   #2245
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by rty View Post
Ok, I'll take a look at it tonight when I reach home.

The content of the recipe is below:

Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
class AdvancedUserRecipe1278063072(BasicNewsRecipe):
    title          = u'Singtao Daily - Canada'
    oldest_article = 7
    max_articles_per_feed = 100
    __author__            = 'rty'
    description           = 'Toronto Canada Chinese Newspaper'
    publisher             = 'news.singtao.ca'
    category              = 'Chinese, News, Canada'
    remove_javascript = True
    use_embedded_content   = False
    no_stylesheets = True
    language = 'cn-HK'
    conversion_options = {'linearize_tables':True} 
    masthead_url = 'http://news.singtao.ca/i/site_2009/logo.jpg'
    extra_css = '''
        @font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
    body {text-align: justify; margin-right: 8pt; font-family: 'DroidFont', serif;}\n
                    h1 {font-family: 'DroidFont', serif;}\n
                    .articledescription {font-family: 'DroidFont', serif;}
            '''
    keep_only_tags = [
    dict(name='div', attrs={'id':['title','storybody']}),
    dict(name='div', attrs={'class':'content'})
    ]

    def parse_index(self):
            feeds = []
            for title, url in [
                ('Editorial', 'http://news.singtao.ca/toronto/editorial.html'),
             ('Toronto 城市/社區', 'http://news.singtao.ca/toronto/city.html'),
             ('Canada 加國', 'http://news.singtao.ca/toronto/canada.html'),
          ('Entertainment', 'http://news.singtao.ca/toronto/entertainment.html'),
    ('World', 'http://news.singtao.ca/toronto/world.html'),
    ('Finance 國際財經', 'http://news.singtao.ca/toronto/finance.html'),
    ('Sports', 'http://news.singtao.ca/toronto/sports.html'),
                            ]:
               articles = self.parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
        
    def parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class': ['newslist paddingL10T10','newslist3 paddingL10T10']})
            #date = div.find(attrs={'class': 'underlineBLK'})
            current_articles = []
            for li in div.findAll('li'):
                    a = li.find('a', href = True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://news.singtao.ca'+url
          #          self.log('\t\tFound article:', title)
          #          self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url': url, 'description':''})

            return current_articles

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
           del item['style']
        for item in soup.findAll(width=True):
           del item['width']
        return soup
I looked at it and nothing jumped out at me. I even loaded a different recipe to see if that would load and it did. Maybe it is some kind of encoding thing on my machine? When I viewed the recipe in Wordpad I was able to see the Chinese characters but when I opened it with Notepad++ I could not view them.
DoctorOhh is offline  
Old 07-04-2010, 09:46 AM   #2246
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,396
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@rty: I've added a fixed version of Sintao to the calibre repository, you can get it from there. Unfortunately, I don't remeber what the fix was.
kovidgoyal is offline  
Old 07-04-2010, 10:00 AM   #2247
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Google Changed Authentication for Google Reader

In another thread many folk have mentioned that their Google Reader recipe has stopped working. The error they experience is HTTPError: HTTP Error 401: Unauthorized.

Spoiler:
Code:
ERROR: Conversion Error: <b>Failed</b>: Fetch news from Google Reader

Fetch news from Google Reader
Resolved conversion options
calibre version: 0.7.7
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 16.0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'dont_split_on_page_breaks': True,
 'extra_css': None,
 'extract_to': None,
 'flow_size': 260,
 'font_size_mapping': None,
 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x03D10590>,
 'insert_blank_line': False,
 'insert_metadata': False,
 'isbn': None,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'max_toc_links': 50,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.SonyReader300Output object at 0x03D10930>,
 'page_breaks_before': None,
 'password': '**********',
 'prefer_metadata_cover': False,
 'preprocess_html': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_first_image': False,
 'remove_footer': False,
 'remove_header': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'series': None,
 'series_index': None,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'use_auto_toc': False,
 'username': '****.*******',
 'verbose': 2}
InputFormatPlugin: Recipe Input running
Python function terminated unexpectedly
   (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 103, in main
  File "site.py", line 85, in run_entry_point
  File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
  File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run
  File "site-packages\calibre\customize\conversion.py", line 211, in __call__
  File "site-packages\calibre\web\feeds\input.py", line 104, in convert
  File "site-packages\calibre\web\feeds\news.py", line 705, in download
  File "site-packages\calibre\web\feeds\news.py", line 835, in build_index
  File "site-packages\calibre\web\feeds\news.py", line 1280, in parse_feeds
  File "c:\docume~1\dell\locals~1\temp\calibre_0.7.7_adbecz_recipes\recipe0.py", line 35, in get_feeds
    soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list')
  File "site-packages\calibre\web\feeds\news.py", line 474, in index_to_soup
  File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 202, in open
  File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 612, in http_response
  File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 225, in error
  File "urllib2.py", line 367, in _call_chain
  File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 633, in http_error_default
urllib2.HTTPError: HTTP Error 401: Unauthorized

Below is a snippet of conversation from the other thread. I think I may have pointed to the reason for the error but unfortunately I lack the knowledge/experience needed to help these folks.

Quote:
Originally Posted by dwanthny View Post
Quote:
Originally Posted by depend View Post
I start getting the same error too. It was good on 6/20. I don't remember exactly which version I was using since there have been quite a few new versions published.
Version does not matter since the recipe never changed. Apparently something in Google Reader changed and the recipe needs to be fixed to match.

This post entitled "Changes to sending authenticated requests to Google Reader" on the Google Reader Blog might hold the key but I don't have the skill set needed to correct the problem.
Quote:
Originally Posted by Starson17 View Post
Looking at the code in the first post of this thread convinces me that your link does hold the key. The current recipe finds and uses the SID cookie. Google stopped using SID cookie authentication and now wants an AUTH header. I'd ask for help in the dedicated recipe thread and make sure you include the link above. I don't use GoogleReader and I've only played around with adding headers and mechanize once.
I Hope someone here might be able to work out this piece of the puzzle and get this recipe working again.
DoctorOhh is offline  
Old 07-04-2010, 10:11 AM   #2248
Gunnerp245
Gadget Freak
Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.Gunnerp245 ought to be getting tired of karma fortunes by now.
 
Gunnerp245's Avatar
 
Posts: 1,169
Karma: 1043832
Join Date: Nov 2007
Location: US
Device: EE, Note 8
Unhappy Recipe Not Working Correctly

Standard calibre news recipes:

English (Thailand) Bangkok Post.
English (Singapore) Today Online - Singapore.

Each only downloads the section headings; no articles.
Gunnerp245 is offline  
Old 07-04-2010, 10:51 AM   #2249
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
@rty: I've added a fixed version of Sintao to the calibre repository, you can get it from there. Unfortunately, I don't remeber what the fix was.
Thanks kovid but I have fixed and reloaded the SINGTAO Canada recipe into the original post. There was a hidden character on the author line but I don't know how it got there in the first place. Singtao Canada is the first News in Traditional Chinese that I worked on.

Last edited by rty; 07-04-2010 at 10:54 AM.
rty is offline  
Old 07-04-2010, 10:58 AM   #2250
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by dwanthny View Post
I Hope someone here might be able to work out this piece of the puzzle and get this recipe working again.
I don't know if anyone is going to jump in here. I'm willing to try, if someone here wants to help me with the Google end of it. I have no idea how Google Reader is supposed to work, nor do I have a user/password, nor do I have any content set up to retrieve. I might be able to make the SID/cookie to AUTH header change, particularly if Kovid answers any tricky questions I run into.

If someone wants to set up a test account, add some blogs or sites, or whatever content is needed in Google, put some "stars" or whatever they are into it, give me the user/pass (here or by PM) and help with figuring out what I'm supposed to be getting, I'll take a whack at it.

Of course, I'd prefer that the original author or someone who knows more about mechanize and Google Reader tackles this, so if there is such a person here, please let me know.
Starson17 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 07:25 PM.


MobileRead.com is a privately owned, operated and funded community.