View Single Post
Old 09-21-2010, 09:37 PM   #2796
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
it is giving this error
ERROR: Invalid input: <p>Could not create recipe. Error:<br>unindent does not match any outer indentation level (recipe46.py, line 51)


actually I am using different feeds as compared to inbuilt recipe
the rss feeds page is
http://feeds.business-standard.com/
alright, first lets not piggyback but yet make our own version since the feeds are different and all. With that being said, I had to test the code to get it correct because the index on the split is 0 based. and also the very last index was blank so even though lets say the length of the split array was 8 then the id would be in the 6th position. so i just idnum = len(split1) -2
anyway this code works.
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Business Standard modified'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Business Standard modified'
    publisher = 'Business Standard'
    category = ''
    oldest_article = 5
    max_articles_per_feed = 100
    no_stylesheets = True
    #extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
    #masthead_url = 'http://gawand.org/wp-content/uploads/2010/06/ajc-logo.gif'
    #keep_only_tags    = [
     #                    dict(name='div', attrs={'class':['blogEntryHeader','blogEntryContent']})
      #                 ,dict(attrs={'id':['cxArticleText','cxArticleBodyText']})
      #                  ]
    feeds = [
             (u'Todays Newspaper',u'http://feeds.business-standard.com/rss/paper.xml'),
             (u'Banking & finance',u'http://feeds.business-standard.com/rss/1.xml'),
             (u'Companies & Industry', u'http://feeds.business-standard.com/rss/2.xml'),
             (u'Economy & Policy'    , u'http://feeds.business-standard.com/rss/3.xml'),
             (u'Opinion and analysis', u'http://feeds.business-standard.com/rss/5_0.xml'),
             (u'Life & Leisure'      , u'http://feeds.business-standard.com/rss/6_0.xml'),
             (u'Markets & Investing' , u'http://feeds.business-standard.com/rss/12.xml'),
             (u'Management & Mktg'   , u'http://feeds.business-standard.com/rss/7_0.xml'),
             (u'Tech World',u'http://feeds.business-standard.com/rss/8_0.xml'),
            ]
    def print_version(self, url):
        split1 = url.split("/")
        print 'ORG URL IS: ', url
        id = len(split1)-2 # had to offset it by 2 because it is 0 based and also the last index is blank 
        idnum = split1[id] # get the actual value of the id article
        print 'the idnum is: ', idnum
        print_url = 'http://www.business-standard.com/india/printpage.php?autono=' + idnum + '&tp='
        print 'PRINT URL IS: ', print_url
        return print_url
TonytheBookworm is offline