View Single Post
Old 09-21-2010, 08:12 PM   #2795
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
If your actually trying to modify the built in recipe. I do not see why. I testing it on my end and do not see after running it where any of the articles were not in print version. Also, I ran a test with print statements included and I do not see anywhere where the original url is what you stated of being changed. It appears to follow the flow that the original author of the recipe expected and looked for. In other words, kinda hard to fix something that isn't broken. <shrug>

As far as the indents you have to make sure they are spaced out correctly.
Spoiler:

Code:
def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1 
        id = len(split1)
        # we want to find the size of the array split 
        # because we know the id will always be in the last index
        
        print_url = ‘http://www.business-standard.com/india/printpage.php?autono=’ + split1[id]+ ‘&tp=’
        return print_url

****notice the return statement is directly under the print_url statement.
it is giving this error
ERROR: Invalid input: <p>Could not create recipe. Error:<br>unindent does not match any outer indentation level (recipe46.py, line 51)

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
'''
www.business-standard.com
'''

from calibre.web.feeds.recipes import BasicNewsRecipe

class BusinessStandard(BasicNewsRecipe):
    title                  = 'Business Standard'
    __author__             = 'Darko Miletic'
    description            = "India's most respected business daily"
    oldest_article         = 7
    max_articles_per_feed  = 100
    no_stylesheets         = True
    use_embedded_content   = False
    encoding               = 'cp1252'
    publisher              = 'Business Standard Limited'
    category               = 'news, business, money, india, world'
    language               = 'en_IN'

    conversion_options = {
                             'comments'        : description
                            ,'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
                            ,'linearize_tables': True
                         }

    remove_attributes=['style']
    remove_tags = [dict(name=['object','link','script','iframe'])]

    feeds = [
                (u'Todays Newspaper'            , u'http://feeds.business-standard.com/rss/paper.xml'   )
              ,(u'Banking & finance'   , u'http://feeds.business-standard.com/rss/1.xml'   )
              ,(u'Companies & Industry', u'http://feeds.business-standard.com/rss/2.xml')
              ,(u'Economy & Policy'    , u'http://feeds.business-standard.com/rss/3.xml'    )
              ,(u'Opinion and analysis', u'http://feeds.business-standard.com/rss/5_0.xml')
              ,(u'Life & Leisure'      , u'http://feeds.business-standard.com/rss/6_0.xml'      )
              ,(u'Markets & Investing' , u'http://feeds.business-standard.com/rss/12.xml' )
              ,(u'Management & Mktg'   , u'http://feeds.business-standard.com/rss/7_0.xml'   )
              ,(u'Tech World',u'http://feeds.business-standard.com/rss/8_0.xml')
            
            ]

    def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1 
        id = len(split1)
        # we want to find the size of the array split 
        # because we know the id will always be in the last index
        


        print_url = ‘http://www.business-standard.com/india/printpage.php?autono=’ + split1[id]+ ‘&tp=’
        return print_url
       
        
      
    def get_article_url(self, article):
        return article.get('guid',  None)
actually I am using different feeds as compared to inbuilt recipe
the rss feeds page is
http://feeds.business-standard.com/

Last edited by bhandarisaurabh; 09-21-2010 at 08:40 PM.
bhandarisaurabh is offline