rty,
I've been fumbling around with making a recipe for cn.wsj.com without an awful lot of success. If you have time and are taking any requests, I'd appreciate whatever help you could give. I'm trying to get the Traditional character edition, which I think means throwing "big5" in front of everything (ex:
http://cn.wsj.com/big5/20100708/FRX003561.asp)
On another note, I made a couple of modifications to the BBC Chinese recipe to make it pull the Traditional version, otherwise it's pretty much as rty built it; sharing in case it's of use to anyone:
Spoiler:
class AdvancedUserRecipe1277443634(BasicNewsRecipe):
title = u'BBC 中文網'
oldest_article = 7
max_articles_per_feed = 100
feeds = [
(u'\u4e3b\u9801', u'http://www.bbc.co.uk/zhongwen/trad/index.xml'),
(u'\u570B\u969B\u65b0\u805e', u'http://www.bbc.co.uk/zhongwen/trad/world/index.xml'),
(u'\u5169\u5CB8\u4E09\u5730', u'http://www.bbc.co.uk/zhongwen/trad/china/index.xml'),
(u'\u91D1\u878D\u8CA1\u7D93', u'http://www.bbc.co.uk/zhongwen/trad/business/index.xml'),
(u'\u7DB2\u4E0A\u4E92\u52D5', u'http://www.bbc.co.uk/zhongwen/trad/interactive/index.xml'),
(u'\u97F3\u8996\u5716\u7247', u'http://www.bbc.co.uk/zhongwen/trad/multimedia/index.xml'),
(u'\u5206\u6790\u8A55\u8AD6', u'http://www.bbc.co.uk/zhongwen/trad/indepth/index.xml')
]
extra_css = '''
@font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
body {margin-right: 8pt; font-family: 'DroidFont', serif;}\n
h1 {font-family: 'DroidFont', serif;}\n
.articledescription {font-family: 'DroidFont', serif;}
'''
__author__ = 'rty'
__version__ = '1.0'
language = 'zh-HANT'
pubisher = 'British Broadcasting Corporation'
description = 'BBC news in Chinese'
category = 'News, Chinese'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
encoding = 'UTF-8'
conversion_options = {'linearize_tables':True}
masthead_url = 'http://wscdn.bbc.co.uk/zhongwen/trad/images/1024/brand.jpg'
keep_only_tags = [
dict(name='h1'),
dict(name='p', attrs={'class':['primary-topic','summary']}),
dict(name='div', attrs={'class':['bodytext','datestamp']}),
]