Thread: Mingpao HK News
View Single Post
Old 09-30-2010, 01:47 AM   #2
esurfer
Junior Member
esurfer began at the beginning.
 
Posts: 5
Karma: 12
Join Date: Sep 2010
Device: kindle 3
hi
here is a recipe for oriental daily. i have only tried it on kindle 3 and kindle PC and it seems to work.

Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Larry Chan <larry1chan at gmail.com>'
'''
oriental daily
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe

class OrientalDaily(BasicNewsRecipe):
    title                  = 'Oriental Dailly'
    __author__             = 'Larry Chan, larry1chan'
    description            = 'News from HK'
    oldest_article         = 2
    max_articles_per_feed  = 100
    simultaneous_downloads = 5
    no_stylesheets         = True
    #delay                  = 1
    use_embedded_content   = False
    encoding               = 'utf8'
    publisher              = 'Oriental Daily'
    category               = 'news, HK, world'
    language               = 'zh-hk'
    publication_type       = 'newsportal'
    extra_css              = ' body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
   
    conversion_options = {
                             'comments'        : description
                            ,'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
                            ,'linearize_tables': True
                         }

    remove_tags_after  = dict(id='bottomNavCTN')

    keep_only_tags    = [
                       dict(name='div', attrs={'id':['leadin', 'contentCTN-right']})		    
                       
                        ]
	
    remove_tags = [
                       dict(name='div', attrs={'class':['largeAdsCTN', 'contentCTN-left', 'textAdsCTN', 'footerAds clear']}),
                       dict(name='div', attrs={'id':['articleNav']})
		       
                        ]

    remove_attributes = ['width','height','href']


    feeds          = [(u'Oriental Daily', u'http://orientaldaily.on.cc/rss/news.xml')]

Last edited by kovidgoyal; 10-07-2010 at 12:23 PM.
esurfer is offline   Reply With Quote