02-09-2011, 09:14 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Mar 2010
Device: Sony PRS-600
|
Kompas (v1.0) - Indonesian
Hi All,
This is my first recipe on Indonesian newspaper - Kompas. I hope that you find it useful. Special thanks for calibre's developers! I totally love it - I can make my 2 hours commuting to/from work not so boring, in fact very productive by reading newspaper downloaded using calibre Anyway, here's the recipe. Hope it's useful Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2011, Adrian Gunawan <agunawan at adrnalin.com>' __author__ = 'Adrian Gunawan' __version__ = 'v1.0' __date__ = '02 February 2011' ''' http://www.kompas.com/ ''' import re from calibre.web.feeds.news import BasicNewsRecipe class Kompas(BasicNewsRecipe): title = u'Kompas' masthead_url = 'http://stat.k.kidsklik.com/data/2k10/kompascom2011/images/logo_kompas.png' cover_url = 'http://stat.k.kidsklik.com/data/2k10/kompascom2011/images/logo_kompas.png' __author__ = u'Adrian Gunawan' description = u'Indonesian News from Kompas Online Edition' category = 'local news, international, business, Indonesia' language = 'id' oldest_article = 5 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False no_javascript = True remove_empty_feeds = True timefmt = ' [%A, %d %B, %Y]' encoding = 'utf-8' keep_only_tags = [dict(name='div', attrs ={'class':'content_kiri_detail'})] extra_css = ''' h1{font-family:Georgia,"Times New Roman",Times,serif; font-weight:bold; font-size:large;} .cT-storyDetails{font-family:Arial,Helvetica,sans-serif; color:#666666;font-size:x-small;} .articleBody{font-family:Arial,Helvetica,sans-serif; color:black;font-size:small;} .cT-imageLandscape{font-family:Arial,Helvetica,sans-serif; color:#333333 ;font-size:x-small;} .source{font-family:Arial,Helvetica,sans-serif; color:#333333 ;font-size:xx-small;} #content{font-family:Arial,Helvetica,sans-serif;font-size:x-small;} .pageprint{font-family:Arial,Helvetica,sans-serif;font-size:small;} #bylineDetails{font-family:Arial,Helvetica,sans-serif; color:#666666;font-size:x-small;} .featurePic-wide{font-family:Arial,Helvetica,sans-serif;font-size:x-small;} #idfeaturepic{font-family:Arial,Helvetica,sans-serif;font-size:x-small;} h3{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;} h2{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;} h4{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;} h5{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;} body{font-family:Arial,Helvetica,sans-serif; font-size:x-small;} ''' remove_tags = [ dict(name='div', attrs ={'class':['c_biru_kompas2011', 'c_abu01_kompas2011', 'c_abu_01_kompas2011', 'right', 'clearit']}), dict(name='div', attrs ={'id':['comment_list', 'comment_paging', 'share']}), dict(name='form'), dict(name='ul'), ] preprocess_regexps = [ (re.compile(r'<!--TERKAIT -->.*<!--TERKAIT END -->', re.DOTALL|re.IGNORECASE),lambda match: ''), (re.compile(r'<strong>Sent Using.*</body>', re.DOTALL|re.IGNORECASE),lambda match: ''), (re.compile(r'<strong>Kirim Komentar Anda</strong>', re.DOTALL|re.IGNORECASE),lambda match: ''), (re.compile(r'<a[^>]*>Kembali ke Index Topik Pilihan</a>', re.DOTALL|re.IGNORECASE),lambda match: ''), ] feeds = [ (u'Nasional', u'http://www.kompas.com/getrss/nasional'), (u'Regional', u'http://www.kompas.com/getrss/regional'), (u'Internasional', u'http://www.kompas.com/getrss/internasional'), (u'Megapolitan', u'http://www.kompas.com/getrss/megapolitan'), (u'Bisnis Keuangan', u'http://www.kompas.com/getrss/bisniskeuangan'), (u'Kesehatan', u'http://www.kompas.com/getrss/kesehatan'), (u'Olahraga', u'http://www.kompas.com/getrss/olahraga'), ] |
01-18-2012, 05:20 AM | #2 |
Junior Member
Posts: 3
Karma: 13070
Join Date: Dec 2011
Location: Perth, Western Australia
Device: Amazon Kindle Touch
|
Hi adrnalin,
Thank you for your work. I've use Calibre to fetch news from Jakarta Post and Kompas.com . The former works flawlessly, links are correctly linked. But when I tried to fetch Kompas I found many links won't work, and sometimes I got news dated 3 months back. I've tried many times and it seems like I always got different result for Kompas. Can you check again your coding, might be something or two you're missing, or is it the Kompas.com web structure that does not compatible with Calibre fetching option? |
Advert | |
|
01-20-2012, 05:55 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Mar 2010
Device: Sony PRS-600
|
Hi amristadi,
I found that Kompas structure was a mess. I swear it used to work, but they may have changed their website structure. I will look at it in a few days.. if you haven't heard from me in a week or so, please reply to this post to remind me Cheers! |
01-27-2012, 04:33 AM | #4 | |
Junior Member
Posts: 3
Karma: 13070
Join Date: Dec 2011
Location: Perth, Western Australia
Device: Amazon Kindle Touch
|
Quote:
Somehow I managed to tweak your recipe and found that now Kompas use another address for "Nasional" section RSS feeds. It's working for the first couple of times but then when I fetch I only got the article title, the news body is missing (only happened to "National" section). Today I tried to fetch again, the Nasional section work solid, but the other section fetch the news from August 18th. When I checked the website, the RSS feeds were actually dated August 18th. You're right, it's a mess and nothing we can do about it. So I guess there's no need for you to look at it, your recipe still works. When you had the time, maybe you can change the RSS feeds address for "National" section and a little bit tidy up removing the social sharing icon below the article title. cheers. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
need help with recipes for indonesian newspapers | wolfmembaca | Recipes | 1 | 02-06-2011 09:30 AM |
Indonesian News Recipes?! | movanet | Recipes | 0 | 11-06-2010 07:11 AM |
Hi I'm Indonesian | hisoka666 | Introduce Yourself | 5 | 03-17-2009 11:56 AM |
Poor Indonesian | imanlhakim | Sony Reader | 12 | 07-22-2007 04:57 PM |