Hello from Bulgaria,
This is my first attempt to create a custom recipe for a news feed. It is for a bulgarian newspaper
www.capital.bg
Code:
from calibre.ptempfile import PersistentTemporaryFile
class Kapital(BasicNewsRecipe):
title = u'\u041a\u0410\u041f\u0418\u0422\u0410\u041b'
__author__ = 'Troshlyak'
__version__ = 'v0.02'
__date__ = '07, July 2011'
no_stylesheets = True
extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
encoding = 'utf8'
masthead_url = 'http://www.capital.bg/i/capital_logo.png'
remove_javascript = True
oldest_article = 7
max_articles_per_feed = 100
feeds = [(u'\u041f\u043e\u043b\u0438\u0442\u0438\u043a\u0430', u'http://www.capital.bg/rss/?rubrid=2248'),
(u'\u0411\u0438\u0437\u043d\u0435\u0441', u'http://www.capital.bg/rss/?rubrid=2267'),
(u'\u0418\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432', u'http://www.capital.bg/rss/?rubrid=2337')]
remove_tags_before = dict(name='div', attrs={'class':'printwrapper'})
remove_tags_after = dict(name='div', attrs={'class':'printwrapper'})
remove_tags = [dict(name='p', attrs={'class':'photoInfo'})]
temp_files = []
articles_are_obfuscated = True
def get_obfuscated_article(self, url):
br = self.get_browser()
br.open(url)
'''
we need to use a try catch block:
what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does
something with the error.
So in our case we take and check to see if we can follow /content/printVersion, then if we can't
then we simply pass it back the original calling url
'''
try:
response = br.follow_link(url_regex='.*?(\\/printversion\\.php\\?)', nr = 0)
html = response.read()
except:
response = br.open(url)
html = response.read()
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(html)
self.temp_files[-1].close()
return self.temp_files[-1].name
The only thing that still bugs me is that at the end of each article in the printed version there is a nice image that I would like to move in front of the article below the title.
Can someone help me with this. Here is a sample article from the site itself:
http://www.capital.bg/printversion.php?storyid=1119112