![]() |
#1 |
Connoisseur
![]() Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
A few new and updated recipes (Chinese newspapers)
Congrats on the 0.8.0 release of the Calibre software! I hope its chief developer Kovid has a good vacation.
Would Kovid help putting the recipes below to the next release of Calibre? Thanks. (Updated) Ming Pao - Hong Kong Code:
__license__ = 'GPL v3' __copyright__ = '2010-2011, Eddie Lau' # Users of Kindle 3 with limited system-level CJK support # please replace the following "True" with "False". __MakePeriodical__ = True # Turn below to true if your device supports display of CJK titles __UseChineseTitle__ = True # Trun below to true if you wish to use life.mingpao.com as the main article source __UseLife__ = True ''' Change Log: 2011/05/12: switch the main parse source to life.mingpao.com, which has more photos on the article pages 2011/03/06: add new articles for finance section, also a new section "Columns" 2011/02/28: rearrange the sections [Disabled until Kindle has better CJK support and can remember last (section,article) read in Sections & Articles View] make it the same title if generating a periodical, so past issue will be automatically put into "Past Issues" folder in Kindle 3 2011/02/20: skip duplicated links in finance section, put photos which may extend a whole page to the back of the articles clean up the indentation 2010/12/07: add entertainment section, use newspaper front page as ebook cover, suppress date display in section list (to avoid wrong date display in case the user generates the ebook in a time zone different from HKT) 2010/11/22: add English section, remove eco-news section which is not updated daily, correct ordering of articles 2010/11/12: add news image and eco-news section 2010/11/08: add parsing of finance section 2010/11/06: temporary work-around for Kindle device having no capability to display unicode in section/article list. 2010/10/31: skip repeated articles in section pages ''' import os, datetime, time, re from calibre.web.feeds.recipes import BasicNewsRecipe from collections import defaultdict from functools import partial from contextlib import nested, closing from calibre import browser, __appname__, iswindows, strftime, preferred_encoding from calibre.ebooks.BeautifulSoup import BeautifulSoup, NavigableString, CData, Tag from calibre.ebooks.metadata.opf2 import OPFCreator from calibre import entity_to_unicode from calibre.web import Recipe from calibre.ebooks.metadata.toc import TOC from calibre.ebooks.metadata import MetaInformation from calibre.web.feeds import feed_from_xml, templates, feeds_from_index, Feed from calibre.web.fetch.simple import option_parser as web2disk_option_parser from calibre.web.fetch.simple import RecursiveFetcher from calibre.utils.threadpool import WorkRequest, ThreadPool, NoResultsPending from calibre.ptempfile import PersistentTemporaryFile from calibre.utils.date import now as nowf from calibre.utils.magick.draw import save_cover_data_to, add_borders_to_image class MPHKRecipe(BasicNewsRecipe): title = 'Ming Pao - Hong Kong' oldest_article = 1 max_articles_per_feed = 100 __author__ = 'Eddie Lau' description = 'Hong Kong Chinese Newspaper (http://news.mingpao.com)' publisher = 'MingPao' category = 'Chinese, News, Hong Kong' remove_javascript = True use_embedded_content = False no_stylesheets = True language = 'zh' encoding = 'Big5-HKSCS' recursions = 0 conversion_options = {'linearize_tables':True} timefmt = '' extra_css = 'img {display: block; margin-left: auto; margin-right: auto; margin-top: 10px; margin-bottom: 10px;} font>b {font-size:200%; font-weight:bold;}' masthead_url = 'http://news.mingpao.com/image/portals_top_logo_news.gif' keep_only_tags = [dict(name='h1'), dict(name='font', attrs={'style':['font-size:14pt; line-height:160%;']}), # for entertainment page title dict(name='font', attrs={'color':['AA0000']}), # for column articles title dict(attrs={'id':['newscontent']}), # entertainment and column page content dict(attrs={'id':['newscontent01','newscontent02']}), dict(attrs={'class':['photo']}), dict(name='img', attrs={'width':['180'], 'alt':['按圖放大']}) # images for source from life.mingpao.com ] remove_tags = [dict(name='style'), dict(attrs={'id':['newscontent135']}), # for the finance page from mpfinance.com dict(name='table')] # for content fetched from life.mingpao.com remove_attributes = ['width'] preprocess_regexps = [ (re.compile(r'<h5>', re.DOTALL|re.IGNORECASE), lambda match: '<h1>'), (re.compile(r'</h5>', re.DOTALL|re.IGNORECASE), lambda match: '</h1>'), (re.compile(r'<p><a href=.+?</a></p>', re.DOTALL|re.IGNORECASE), # for entertainment page lambda match: ''), # skip <br> after title in life.mingpao.com fetched article (re.compile(r"<div id='newscontent'><br>", re.DOTALL|re.IGNORECASE), lambda match: "<div id='newscontent'>"), (re.compile(r"<br><br></b>", re.DOTALL|re.IGNORECASE), lambda match: "</b>") ] def image_url_processor(cls, baseurl, url): # trick: break the url at the first occurance of digit, add an additional # '_' at the front # not working, may need to move this to preprocess_html() method # minIdx = 10000 # i0 = url.find('0') # if i0 >= 0 and i0 < minIdx: # minIdx = i0 # i1 = url.find('1') # if i1 >= 0 and i1 < minIdx: # minIdx = i1 # i2 = url.find('2') # if i2 >= 0 and i2 < minIdx: # minIdx = i2 # i3 = url.find('3') # if i3 >= 0 and i0 < minIdx: # minIdx = i3 # i4 = url.find('4') # if i4 >= 0 and i4 < minIdx: # minIdx = i4 # i5 = url.find('5') # if i5 >= 0 and i5 < minIdx: # minIdx = i5 # i6 = url.find('6') # if i6 >= 0 and i6 < minIdx: # minIdx = i6 # i7 = url.find('7') # if i7 >= 0 and i7 < minIdx: # minIdx = i7 # i8 = url.find('8') # if i8 >= 0 and i8 < minIdx: # minIdx = i8 # i9 = url.find('9') # if i9 >= 0 and i9 < minIdx: # minIdx = i9 return url def get_dtlocal(self): dt_utc = datetime.datetime.utcnow() # convert UTC to local hk time - at around HKT 6.00am, all news are available dt_local = dt_utc - datetime.timedelta(-2.0/24) return dt_local def get_fetchdate(self): return self.get_dtlocal().strftime("%Y%m%d") def get_fetchformatteddate(self): return self.get_dtlocal().strftime("%Y-%m-%d") def get_fetchday(self): dt_utc = datetime.datetime.utcnow() # convert UTC to local hk time - at around HKT 6.00am, all news are available dt_local = dt_utc - datetime.timedelta(-2.0/24) return self.get_dtlocal().strftime("%d") def get_cover_url(self): cover = 'http://news.mingpao.com/' + self.get_fetchdate() + '/' + self.get_fetchdate() + '_' + self.get_fetchday() + 'gacov.jpg' br = BasicNewsRecipe.get_browser() try: br.open(cover) except: cover = None return cover def parse_index(self): feeds = [] dateStr = self.get_fetchdate() if __UseLife__: for title, url, keystr in [(u'\u8981\u805e Headline', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalga', 'nal'), (u'\u6e2f\u805e Local', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalgb', 'nal'), (u'\u6559\u80b2 Education', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalgf', 'nal'), (u'\u793e\u8a55/\u7b46\u9663 Editorial', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalmr', 'nal'), (u'\u8ad6\u58c7 Forum', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalfa', 'nal'), (u'\u4e2d\u570b China', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalca', 'nal'), (u'\u570b\u969b World', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalta', 'nal'), (u'\u7d93\u6fdf Finance', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalea', 'nal'), (u'\u9ad4\u80b2 Sport', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalsp', 'nal'), (u'\u5f71\u8996 Film/TV', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalma', 'nal'), (u'\u5c08\u6b04 Columns', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn', 'ncl')]: articles = self.parse_section2(url, keystr) if articles: feeds.append((title, articles)) for title, url in [(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'), (u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]: articles = self.parse_section(url) if articles: feeds.append((title, articles)) else: for title, url in [(u'\u8981\u805e Headline', 'http://news.mingpao.com/' + dateStr + '/gaindex.htm'), (u'\u6e2f\u805e Local', 'http://news.mingpao.com/' + dateStr + '/gbindex.htm'), (u'\u6559\u80b2 Education', 'http://news.mingpao.com/' + dateStr + '/gfindex.htm')]: articles = self.parse_section(url) if articles: feeds.append((title, articles)) # special- editorial ed_articles = self.parse_ed_section('http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalmr') if ed_articles: feeds.append((u'\u793e\u8a55/\u7b46\u9663 Editorial', ed_articles)) for title, url in [(u'\u8ad6\u58c7 Forum', 'http://news.mingpao.com/' + dateStr + '/faindex.htm'), (u'\u4e2d\u570b China', 'http://news.mingpao.com/' + dateStr + '/caindex.htm'), (u'\u570b\u969b World', 'http://news.mingpao.com/' + dateStr + '/taindex.htm')]: articles = self.parse_section(url) if articles: feeds.append((title, articles)) # special - finance #fin_articles = self.parse_fin_section('http://www.mpfinance.com/htm/Finance/' + dateStr + '/News/ea,eb,ecindex.htm') fin_articles = self.parse_fin_section('http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalea') if fin_articles: feeds.append((u'\u7d93\u6fdf Finance', fin_articles)) for title, url in [('Tech News', 'http://news.mingpao.com/' + dateStr + '/naindex.htm'), (u'\u9ad4\u80b2 Sport', 'http://news.mingpao.com/' + dateStr + '/spindex.htm')]: articles = self.parse_section(url) if articles: feeds.append((title, articles)) # special - entertainment ent_articles = self.parse_ent_section('http://ol.mingpao.com/cfm/star1.cfm') if ent_articles: feeds.append((u'\u5f71\u8996 Film/TV', ent_articles)) for title, url in [(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'), (u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]: articles = self.parse_section(url) if articles: feeds.append((title, articles)) # special- columns col_articles = self.parse_col_section('http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn') if col_articles: feeds.append((u'\u5c08\u6b04 Columns', col_articles)) return feeds # parse from news.mingpao.com def parse_section(self, url): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) divs = soup.findAll(attrs={'class': ['bullet','bullet_grey']}) current_articles = [] included_urls = [] divs.reverse() for i in divs: a = i.find('a', href = True) title = self.tag_to_string(a) url = a.get('href', False) url = 'http://news.mingpao.com/' + dateStr + '/' +url if url not in included_urls and url.rfind('Redirect') == -1: current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) included_urls.append(url) current_articles.reverse() return current_articles # parse from life.mingpao.com def parse_section2(self, url, keystr): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) a = soup.findAll('a', href=True) a.reverse() current_articles = [] included_urls = [] for i in a: title = self.tag_to_string(i) url = 'http://life.mingpao.com/cfm/' + i.get('href', False) if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind(keystr) == -1): current_articles.append({'title': title, 'url': url, 'description': ''}) included_urls.append(url) current_articles.reverse() return current_articles def parse_ed_section(self, url): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) a = soup.findAll('a', href=True) a.reverse() current_articles = [] included_urls = [] for i in a: title = self.tag_to_string(i) url = 'http://life.mingpao.com/cfm/' + i.get('href', False) if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind('nal') == -1): current_articles.append({'title': title, 'url': url, 'description': ''}) included_urls.append(url) current_articles.reverse() return current_articles def parse_fin_section(self, url): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) a = soup.findAll('a', href= True) current_articles = [] included_urls = [] for i in a: #url = 'http://www.mpfinance.com/cfm/' + i.get('href', False) url = 'http://life.mingpao.com/cfm/' + i.get('href', False) #if url not in included_urls and not url.rfind(dateStr) == -1 and url.rfind('index') == -1: if url not in included_urls and (not url.rfind('txt') == -1) and (not url.rfind('nal') == -1): title = self.tag_to_string(i) current_articles.append({'title': title, 'url': url, 'description':''}) included_urls.append(url) return current_articles def parse_ent_section(self, url): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) a = soup.findAll('a', href=True) a.reverse() current_articles = [] included_urls = [] for i in a: title = self.tag_to_string(i) url = 'http://ol.mingpao.com/cfm/' + i.get('href', False) if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind('star') == -1): current_articles.append({'title': title, 'url': url, 'description': ''}) included_urls.append(url) current_articles.reverse() return current_articles def parse_col_section(self, url): dateStr = self.get_fetchdate() soup = self.index_to_soup(url) a = soup.findAll('a', href=True) a.reverse() current_articles = [] included_urls = [] for i in a: title = self.tag_to_string(i) url = 'http://life.mingpao.com/cfm/' + i.get('href', False) if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind('ncl') == -1): current_articles.append({'title': title, 'url': url, 'description': ''}) included_urls.append(url) current_articles.reverse() return current_articles def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(style=True): del item['width'] for item in soup.findAll(stype=True): del item['absmiddle'] return soup def create_opf(self, feeds, dir=None): if dir is None: dir = self.output_dir if __UseChineseTitle__ == True: title = u'\u660e\u5831 (\u9999\u6e2f)' else: title = self.short_title() # if not generating a periodical, force date to apply in title if __MakePeriodical__ == False: title = title + ' ' + self.get_fetchformatteddate() if True: mi = MetaInformation(title, [self.publisher]) mi.publisher = self.publisher mi.author_sort = self.publisher if __MakePeriodical__ == True: mi.publication_type = 'periodical:'+self.publication_type+':'+self.short_title() else: mi.publication_type = self.publication_type+':'+self.short_title() #mi.timestamp = nowf() mi.timestamp = self.get_dtlocal() mi.comments = self.description if not isinstance(mi.comments, unicode): mi.comments = mi.comments.decode('utf-8', 'replace') #mi.pubdate = nowf() mi.pubdate = self.get_dtlocal() opf_path = os.path.join(dir, 'index.opf') ncx_path = os.path.join(dir, 'index.ncx') opf = OPFCreator(dir, mi) # Add mastheadImage entry to <guide> section mp = getattr(self, 'masthead_path', None) if mp is not None and os.access(mp, os.R_OK): from calibre.ebooks.metadata.opf2 import Guide ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu()) ref.type = 'masthead' ref.title = 'Masthead Image' opf.guide.append(ref) manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))] manifest.append(os.path.join(dir, 'index.html')) manifest.append(os.path.join(dir, 'index.ncx')) # Get cover cpath = getattr(self, 'cover_path', None) if cpath is None: pf = open(os.path.join(dir, 'cover.jpg'), 'wb') if self.default_cover(pf): cpath = pf.name if cpath is not None and os.access(cpath, os.R_OK): opf.cover = cpath manifest.append(cpath) # Get masthead mpath = getattr(self, 'masthead_path', None) if mpath is not None and os.access(mpath, os.R_OK): manifest.append(mpath) opf.create_manifest_from_files_in(manifest) for mani in opf.manifest: if mani.path.endswith('.ncx'): mani.id = 'ncx' if mani.path.endswith('mastheadImage.jpg'): mani.id = 'masthead-image' entries = ['index.html'] toc = TOC(base_path=dir) self.play_order_counter = 0 self.play_order_map = {} def feed_index(num, parent): f = feeds[num] for j, a in enumerate(f): if getattr(a, 'downloaded', False): adir = 'feed_%d/article_%d/'%(num, j) auth = a.author if not auth: auth = None desc = a.text_summary if not desc: desc = None else: desc = self.description_limiter(desc) entries.append('%sindex.html'%adir) po = self.play_order_map.get(entries[-1], None) if po is None: self.play_order_counter += 1 po = self.play_order_counter parent.add_item('%sindex.html'%adir, None, a.title if a.title else _('Untitled Article'), play_order=po, author=auth, description=desc) last = os.path.join(self.output_dir, ('%sindex.html'%adir).replace('/', os.sep)) for sp in a.sub_pages: prefix = os.path.commonprefix([opf_path, sp]) relp = sp[len(prefix):] entries.append(relp.replace(os.sep, '/')) last = sp if os.path.exists(last): with open(last, 'rb') as fi: src = fi.read().decode('utf-8') soup = BeautifulSoup(src) body = soup.find('body') if body is not None: prefix = '/'.join('..'for i in range(2*len(re.findall(r'link\d+', last)))) templ = self.navbar.generate(True, num, j, len(f), not self.has_single_feed, a.orig_url, self.publisher, prefix=prefix, center=self.center_navbar) elem = BeautifulSoup(templ.render(doctype='xhtml').decode('utf-8')).find('div') body.insert(len(body.contents), elem) with open(last, 'wb') as fi: fi.write(unicode(soup).encode('utf-8')) if len(feeds) == 0: raise Exception('All feeds are empty, aborting.') if len(feeds) > 1: for i, f in enumerate(feeds): entries.append('feed_%d/index.html'%i) po = self.play_order_map.get(entries[-1], None) if po is None: self.play_order_counter += 1 po = self.play_order_counter auth = getattr(f, 'author', None) if not auth: auth = None desc = getattr(f, 'description', None) if not desc: desc = None feed_index(i, toc.add_item('feed_%d/index.html'%i, None, f.title, play_order=po, description=desc, author=auth)) else: entries.append('feed_%d/index.html'%0) feed_index(0, toc) for i, p in enumerate(entries): entries[i] = os.path.join(dir, p.replace('/', os.sep)) opf.create_spine(entries) opf.set_toc(toc) with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file): opf.render(opf_file, ncx_file) Code:
__license__ = 'GPL v3' # dug from https://www.mobileread.com/forums/showthread.php?p=1012294 class AdvancedUserRecipe1277443634(BasicNewsRecipe): title = u'中時電子報' oldest_article = 1 max_articles_per_feed = 100 feeds = [(u'焦點', u'http://rss.chinatimes.com/rss/focus-u.rss'), (u'政治', u'http://rss.chinatimes.com/rss/Politic-u.rss'), (u'社會', u'http://rss.chinatimes.com/rss/social-u.rss'), (u'國際', u'http://rss.chinatimes.com/rss/international-u.rss'), (u'兩岸', u'http://rss.chinatimes.com/rss/mainland-u.rss'), (u'地方', u'http://rss.chinatimes.com/rss/local-u.rss'), (u'言論', u'http://rss.chinatimes.com/rss/comment-u.rss'), (u'科技', u'http://rss.chinatimes.com/rss/technology-u.rss'), (u'運動', u'http://rss.chinatimes.com/rss/sport-u.rss'), (u'藝文', u'http://rss.chinatimes.com/rss/philology-u.rss'), #(u'旺報', u'http://rss.chinatimes.com/rss/want-u.rss'), #(u'財經', u'http://rss.chinatimes.com/rss/finance-u.rss'), # broken links #(u'股市', u'http://rss.chinatimes.com/rss/stock-u.rss') # broken links ] __author__ = 'einstuerzende, updated by Eddie Lau' __version__ = '1.0' language = 'zh-TW' publisher = 'China Times Group' description = 'China Times (Taiwan)' category = 'News, Chinese, Taiwan' remove_javascript = True use_embedded_content = False no_stylesheets = True encoding = 'big5' conversion_options = {'linearize_tables':True} masthead_url = 'http://www.fcuaa.org/gif/chinatimeslogo.gif' cover_url = 'http://www.fcuaa.org/gif/chinatimeslogo.gif' keep_only_tags = [dict(name='div', attrs={'class':['articlebox','articlebox clearfix']})] remove_tags = [dict(name='div', attrs={'class':['focus-news']})] Code:
__license__ = 'GPL v3' class UnitedDaily(BasicNewsRecipe): title = u'聯合新聞網' oldest_article = 1 max_articles_per_feed = 100 feeds = [(u'焦點', u'http://udn.com/udnrss/focus.xml'), (u'政治', u'http://udn.com/udnrss/politics.xml'), (u'社會', u'http://udn.com/udnrss/social.xml'), (u'生活', u'http://udn.com/udnrss/life.xml'), (u'綜合', u'http://udn.com/udnrss/education.xml'), (u'意見評論', u'http://udn.com/udnrss/opinion.xml'), (u'大台北', u'http://udn.com/udnrss/local_taipei.xml'), (u'桃竹苗', u'http://udn.com/udnrss/local_tyhcml.xml'), (u'中彰投', u'http://udn.com/udnrss/local_tcchnt.xml'), (u'雲嘉南', u'http://udn.com/udnrss/local_ylcytn.xml'), (u'高屏離島', u'http://udn.com/udnrss/local_ksptisland.xml'), (u'基宜花東', u'http://udn.com/udnrss/local_klilhltt.xml'), (u'台灣百寶鄉', u'http://udn.com/udnrss/local_oddlyenough.xml'), (u'兩岸要聞', u'http://udn.com/udnrss/mainland.xml'), (u'國際焦點', u'http://udn.com/udnrss/international.xml'), (u'台商經貿', u'http://udn.com/udnrss/financechina.xml'), (u'國際財經', u'http://udn.com/udnrss/financeworld.xml'), (u'財經焦點', u'http://udn.com/udnrss/financesfocus.xml'), (u'股市要聞', u'http://udn.com/udnrss/stock.xml'), (u'股市快訊', u'http://udn.com/udnrss/stklatest.xml'), (u'稅務法務', u'http://udn.com/udnrss/tax.xml'), (u'房市情報', u'http://udn.com/udnrss/houses.xml'), (u'棒球', u'http://udn.com/udnrss/baseball.xml'), (u'籃球', u'http://udn.com/udnrss/basketball.xml'), (u'體壇動態', u'http://udn.com/udnrss/sportsfocus.xml'), (u'熱門星聞', u'http://udn.com/udnrss/starsfocus.xml'), (u'廣電港陸', u'http://udn.com/udnrss/tv.xml'), (u'海外星球', u'http://udn.com/udnrss/starswestern.xml'), (u'日韓星情', u'http://udn.com/udnrss/starsjk.xml'), (u'電影世界', u'http://udn.com/udnrss/movie.xml'), (u'流行音樂', u'http://udn.com/udnrss/music.xml'), (u'觀點專題', u'http://udn.com/udnrss/starssubject.xml'), (u'食樂指南', u'http://udn.com/udnrss/food.xml'), (u'折扣好康', u'http://udn.com/udnrss/shopping.xml'), (u'醫藥新聞', u'http://udn.com/udnrss/health.xml'), (u'家婦繽紛', u'http://udn.com/udnrss/benfen.xml'), (u'談星論命', u'http://udn.com/udnrss/astrology.xml'), (u'文化副刊', u'http://udn.com/udnrss/reading.xml'), ] extra_css = '''div[id='story_title'] {font-size:200%; font-weight:bold;}''' __author__ = 'Eddie Lau' __version__ = '1.0' language = 'zh-TW' publisher = 'United Daily News Group' description = 'United Daily (Taiwan)' category = 'News, Chinese, Taiwan' remove_javascript = True use_embedded_content = False no_stylesheets = True encoding = 'big5' conversion_options = {'linearize_tables':True} masthead_url = 'http://udn.com/NEWS/2004/images/logo_udn.gif' cover_url = 'http://udn.com/NEWS/2004/images/logo_udn.gif' keep_only_tags = [dict(name='div', attrs={'id':['story_title','story_author', 'story']})] remove_tags = [dict(name='div', attrs={'id':['mvouter']})] Code:
__license__ = 'GPL v3' # dug from https://www.mobileread.com/forums/showthread.php?p=1012294 class AdvancedUserRecipe1277443634(BasicNewsRecipe): title = u'自由電子報' oldest_article = 1 max_articles_per_feed = 100 feeds = [(u'焦點新聞', u'http://www.libertytimes.com.tw/rss/fo.xml'), (u'政治新聞', u'http://www.libertytimes.com.tw/rss/p.xml'), (u'生活新聞', u'http://www.libertytimes.com.tw/rss/life.xml'), (u'國際新聞', u'http://www.libertytimes.com.tw/rss/int.xml'), (u'自由廣場', u'http://www.libertytimes.com.tw/rss/o.xml'), (u'社會新聞', u'http://www.libertytimes.com.tw/rss/so.xml'), (u'體育新聞', u'http://www.libertytimes.com.tw/rss/sp.xml'), (u'財經焦點', u'http://www.libertytimes.com.tw/rss/e.xml'), (u'證券理財', u'http://www.libertytimes.com.tw/rss/stock.xml'), (u'影視焦點', u'http://www.libertytimes.com.tw/rss/show.xml'), (u'北部新聞', u'http://www.libertytimes.com.tw/rss/north.xml'), (u'中部新聞', u'http://www.libertytimes.com.tw/rss/center.xml'), (u'南部新聞', u'http://www.libertytimes.com.tw/rss/south.xml'), (u'大台北新聞', u'http://www.libertytimes.com.tw/rss/taipei.xml'), (u'藝術文化', u'http://www.libertytimes.com.tw/rss/art.xml'), ] extra_css = '''span[class='insubject1'][id='newtitle'] {font-size:200%; font-weight:bold;}''' __author__ = 'einstuerzende, updated by Eddie Lau' __version__ = '1.1' language = 'zh-HANT' publisher = 'Liberty Times Group' description = 'Liberty Times (Taiwan)' category = 'News, Chinese, Taiwan' remove_javascript = True use_embedded_content = False no_stylesheets = True encoding = 'big5' conversion_options = {'linearize_tables':True} masthead_url = 'http://www.libertytimes.com.tw/2008/images/img_auto/005/logo_new.gif' cover_url = 'http://www.libertytimes.com.tw/2008/images/img_auto/005/logo_new.gif' keep_only_tags = [dict(name='td', attrs={'id':['newsContent']})] Last edited by tylau0; 05-12-2011 at 01:51 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
need help with recipes for indonesian newspapers | wolfmembaca | Recipes | 1 | 02-06-2011 09:30 AM |
Request for Recipes: Yakima Valley Newspapers | Tegan | Recipes | 19 | 01-20-2011 09:19 AM |
Look Out Rooted Nook Color - Kindle App updated to include Magazines and Newspapers | French | Nook Developer's Corner | 0 | 12-17-2010 11:01 AM |
Updated Recipes - automatically in Calibre? | warshauer | Recipes | 1 | 10-29-2010 10:03 AM |
Read Chinese books in Sony Reader PRS900 using Chinese Fonts | PSL | ePub | 3 | 10-08-2010 08:11 AM |