View Single Post
Old 07-07-2011, 04:43 AM   #1
throshlyak
Junior Member
throshlyak began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2011
Device: kindle3
Recipe for capital.bg

Hello from Bulgaria,

This is my first attempt to create a custom recipe for a news feed. It is for a bulgarian newspaper www.capital.bg

Code:
from calibre.ptempfile import PersistentTemporaryFile

class Kapital(BasicNewsRecipe):
    title                 = u'\u041a\u0410\u041f\u0418\u0422\u0410\u041b'
    __author__            = 'Troshlyak'
    __version__           = 'v0.02'
    __date__              = '07, July 2011'
    no_stylesheets        = True
    extra_css             = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
    encoding              = 'utf8'
    masthead_url          = 'http://www.capital.bg/i/capital_logo.png'
    remove_javascript     = True
    oldest_article        = 7
    max_articles_per_feed = 100

    feeds = [(u'\u041f\u043e\u043b\u0438\u0442\u0438\u043a\u0430', u'http://www.capital.bg/rss/?rubrid=2248'), 
             (u'\u0411\u0438\u0437\u043d\u0435\u0441', u'http://www.capital.bg/rss/?rubrid=2267'), 
             (u'\u0418\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432', u'http://www.capital.bg/rss/?rubrid=2337')]
             
    remove_tags_before = dict(name='div', attrs={'class':'printwrapper'})
    remove_tags_after = dict(name='div', attrs={'class':'printwrapper'})
    remove_tags = [dict(name='p', attrs={'class':'photoInfo'})]

    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        
        br.open(url)
        '''
             we need to use a try catch block:
             what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does
             something with the error.
             So in our case we take and check to see if we can follow /content/printVersion, then if we can't
             then we simply pass it back the original calling url 
        '''
        
        try:
            response = br.follow_link(url_regex='.*?(\\/printversion\\.php\\?)', nr = 0)
            html = response.read()
        except:
            response = br.open(url)
            html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name
The only thing that still bugs me is that at the end of each article in the printed version there is a nice image that I would like to move in front of the article below the title.

Can someone help me with this. Here is a sample article from the site itself:
http://www.capital.bg/printversion.php?storyid=1119112
throshlyak is offline   Reply With Quote