Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-09-2011, 10:14 PM   #1
adrnalin
Junior Member
adrnalin began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2010
Device: Sony PRS-600
Kompas (v1.0) - Indonesian

Hi All,

This is my first recipe on Indonesian newspaper - Kompas. I hope that you find it useful.

Special thanks for calibre's developers! I totally love it - I can make my 2 hours commuting to/from work not so boring, in fact very productive by reading newspaper downloaded using calibre

Anyway, here's the recipe. Hope it's useful

Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2011, Adrian Gunawan <agunawan at adrnalin.com>'
__author__    = 'Adrian Gunawan'
__version__   = 'v1.0'
__date__      = '02 February 2011'

'''
http://www.kompas.com/
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class Kompas(BasicNewsRecipe):
    title          = u'Kompas'
    masthead_url   = 'http://stat.k.kidsklik.com/data/2k10/kompascom2011/images/logo_kompas.png'
    cover_url   = 'http://stat.k.kidsklik.com/data/2k10/kompascom2011/images/logo_kompas.png'

    __author__     = u'Adrian Gunawan'
    description    = u'Indonesian News from Kompas Online Edition'
    category       = 'local news, international, business, Indonesia'
    language       = 'id'
    oldest_article = 5
    max_articles_per_feed = 100

    no_stylesheets        = True
    use_embedded_content  = False
    no_javascript         = True
    remove_empty_feeds    = True

    timefmt               = ' [%A, %d %B, %Y]'
    encoding              = 'utf-8'

    keep_only_tags = [dict(name='div', attrs ={'class':'content_kiri_detail'})]

    extra_css = '''
                  h1{font-family:Georgia,"Times New Roman",Times,serif; font-weight:bold; font-size:large;}
                  .cT-storyDetails{font-family:Arial,Helvetica,sans-serif; color:#666666;font-size:x-small;}
                  .articleBody{font-family:Arial,Helvetica,sans-serif; color:black;font-size:small;}
                  .cT-imageLandscape{font-family:Arial,Helvetica,sans-serif; color:#333333 ;font-size:x-small;}
                  .source{font-family:Arial,Helvetica,sans-serif; color:#333333 ;font-size:xx-small;}
                  #content{font-family:Arial,Helvetica,sans-serif;font-size:x-small;}
                  .pageprint{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                  #bylineDetails{font-family:Arial,Helvetica,sans-serif; color:#666666;font-size:x-small;}
                  .featurePic-wide{font-family:Arial,Helvetica,sans-serif;font-size:x-small;}
                  #idfeaturepic{font-family:Arial,Helvetica,sans-serif;font-size:x-small;}
                  h3{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;}
                  h2{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;}
                  h4{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;}
                  h5{font-family:Georgia,"Times New Roman",Times,serif; font-size:small;}
                  body{font-family:Arial,Helvetica,sans-serif; font-size:x-small;}
                '''

    remove_tags     = [
                        dict(name='div', attrs ={'class':['c_biru_kompas2011', 'c_abu01_kompas2011', 'c_abu_01_kompas2011', 'right', 'clearit']}),
                        dict(name='div', attrs ={'id':['comment_list', 'comment_paging', 'share']}),
                        dict(name='form'),
                        dict(name='ul'),
                       ]

    preprocess_regexps = [
                          (re.compile(r'<!--TERKAIT -->.*<!--TERKAIT END -->', re.DOTALL|re.IGNORECASE),lambda match: ''),
                          (re.compile(r'<strong>Sent Using.*</body>', re.DOTALL|re.IGNORECASE),lambda match: ''),
                          (re.compile(r'<strong>Kirim Komentar Anda</strong>', re.DOTALL|re.IGNORECASE),lambda match: ''),
                          (re.compile(r'<a[^>]*>Kembali ke Index Topik Pilihan</a>', re.DOTALL|re.IGNORECASE),lambda match: ''),
                         ]

    feeds          = [
                      (u'Nasional', u'http://www.kompas.com/getrss/nasional'),
                      (u'Regional', u'http://www.kompas.com/getrss/regional'),
                      (u'Internasional', u'http://www.kompas.com/getrss/internasional'),
                      (u'Megapolitan', u'http://www.kompas.com/getrss/megapolitan'),
                      (u'Bisnis Keuangan', u'http://www.kompas.com/getrss/bisniskeuangan'),
                      (u'Kesehatan', u'http://www.kompas.com/getrss/kesehatan'),
                      (u'Olahraga', u'http://www.kompas.com/getrss/olahraga'),
                      ]
adrnalin is offline   Reply With Quote
Old 01-18-2012, 06:20 AM   #2
amristadi
Junior Member
amristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blue
 
amristadi's Avatar
 
Posts: 3
Karma: 13070
Join Date: Dec 2011
Location: Perth, Western Australia
Device: Amazon Kindle Touch
Hi adrnalin,

Thank you for your work. I've use Calibre to fetch news from Jakarta Post and Kompas.com . The former works flawlessly, links are correctly linked. But when I tried to fetch Kompas I found many links won't work, and sometimes I got news dated 3 months back. I've tried many times and it seems like I always got different result for Kompas.

Can you check again your coding, might be something or two you're missing, or is it the Kompas.com web structure that does not compatible with Calibre fetching option?
amristadi is offline   Reply With Quote
 
Advertisement
Old 01-20-2012, 06:55 PM   #3
adrnalin
Junior Member
adrnalin began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2010
Device: Sony PRS-600
Hi amristadi,

I found that Kompas structure was a mess. I swear it used to work, but they may have changed their website structure. I will look at it in a few days.. if you haven't heard from me in a week or so, please reply to this post to remind me
Cheers!
adrnalin is offline   Reply With Quote
Old 01-27-2012, 05:33 AM   #4
amristadi
Junior Member
amristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blueamristadi can differentiate black from dark navy blue
 
amristadi's Avatar
 
Posts: 3
Karma: 13070
Join Date: Dec 2011
Location: Perth, Western Australia
Device: Amazon Kindle Touch
Quote:
Originally Posted by adrnalin View Post
Hi amristadi,

I found that Kompas structure was a mess. I swear it used to work, but they may have changed their website structure. I will look at it in a few days.. if you haven't heard from me in a week or so, please reply to this post to remind me
Cheers!
Thanks,
Somehow I managed to tweak your recipe and found that now Kompas use another address for "Nasional" section RSS feeds. It's working for the first couple of times but then when I fetch I only got the article title, the news body is missing (only happened to "National" section).
Today I tried to fetch again, the Nasional section work solid, but the other section fetch the news from August 18th. When I checked the website, the RSS feeds were actually dated August 18th. You're right, it's a mess and nothing we can do about it.

So I guess there's no need for you to look at it, your recipe still works. When you had the time, maybe you can change the RSS feeds address for "National" section and a little bit tidy up removing the social sharing icon below the article title.

cheers.
amristadi is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
need help with recipes for indonesian newspapers wolfmembaca Recipes 1 02-06-2011 10:30 AM
Indonesian News Recipes?! movanet Recipes 0 11-06-2010 08:11 AM
Hi I'm Indonesian hisoka666 Introduce Yourself 5 03-17-2009 12:56 PM
Poor Indonesian imanlhakim Sony Reader 12 07-22-2007 05:57 PM


All times are GMT -4. The time now is 01:20 PM.


MobileRead.com is a privately owned, operated and funded community.