07-07-2011, 04:43 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jul 2011
Device: kindle3
|
Recipe for capital.bg
Hello from Bulgaria,
This is my first attempt to create a custom recipe for a news feed. It is for a bulgarian newspaper www.capital.bg Code:
from calibre.ptempfile import PersistentTemporaryFile class Kapital(BasicNewsRecipe): title = u'\u041a\u0410\u041f\u0418\u0422\u0410\u041b' __author__ = 'Troshlyak' __version__ = 'v0.02' __date__ = '07, July 2011' no_stylesheets = True extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}' encoding = 'utf8' masthead_url = 'http://www.capital.bg/i/capital_logo.png' remove_javascript = True oldest_article = 7 max_articles_per_feed = 100 feeds = [(u'\u041f\u043e\u043b\u0438\u0442\u0438\u043a\u0430', u'http://www.capital.bg/rss/?rubrid=2248'), (u'\u0411\u0438\u0437\u043d\u0435\u0441', u'http://www.capital.bg/rss/?rubrid=2267'), (u'\u0418\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432', u'http://www.capital.bg/rss/?rubrid=2337')] remove_tags_before = dict(name='div', attrs={'class':'printwrapper'}) remove_tags_after = dict(name='div', attrs={'class':'printwrapper'}) remove_tags = [dict(name='p', attrs={'class':'photoInfo'})] temp_files = [] articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() br.open(url) ''' we need to use a try catch block: what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does something with the error. So in our case we take and check to see if we can follow /content/printVersion, then if we can't then we simply pass it back the original calling url ''' try: response = br.follow_link(url_regex='.*?(\\/printversion\\.php\\?)', nr = 0) html = response.read() except: response = br.open(url) html = response.read() self.temp_files.append(PersistentTemporaryFile('_fa.html')) self.temp_files[-1].write(html) self.temp_files[-1].close() return self.temp_files[-1].name Can someone help me with this. Here is a sample article from the site itself: http://www.capital.bg/printversion.php?storyid=1119112 |
07-07-2011, 10:34 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
If you can restructure your recipe with keep_only_tags and list the image tag first, then the other tags you want to keep, that will cause a reordering. Alternatively, you can use BeautifulSoup and postprocess_html to reshuffle the tags.
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
recipe for Capital.de - german | schuster | Recipes | 1 | 05-01-2016 04:45 AM |
lower case to capital | tscamera | Sigil | 12 | 03-18-2012 10:54 PM |
Hi from the Capital of the World...Belfast | brasco | Introduce Yourself | 6 | 01-11-2011 12:23 PM |
Help With Capital Letter Going Astray | Marcy | Calibre | 3 | 08-10-2010 11:38 AM |
Hi from the ex capital of culture | snickp | Introduce Yourself | 14 | 01-30-2009 02:27 PM |