View Single Post
Old 06-25-2010, 06:54 AM   #15
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
It works well on my recipe for BBC Chinese (http://www.bbc.co.uk/zhongwen/simp/indepth/index.xml)

Spoiler:

Code:
class AdvancedUserRecipe1277443634(BasicNewsRecipe):
    title          = u'BBC Chinese'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [
	#(u'\u4e3b\u9875', u'http://www.bbc.co.uk/zhongwen/simp/index.xml'), 
	#(u'\u5206\u6790\u8bc4\u8bba', u'http://www.bbc.co.uk/zhongwen/simp/indepth/index.xml')
	]
    extra_css = '''
    	@font-face {font-family: "DroidFont", serif, sans-serif;  src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
	body {margin-right: 8pt; font-family: 'DroidFont', serif;}
                    h1 {font-family: 'DroidFont', serif, sans-serif}
            '''
    __author__            = 'rty'
    __version__            = '1.0'
    language = 'zh-HANS'
    pubisher  = 'British Broadcasting Corporation'
    description           = 'BBC news in Chinese'
    category              = 'News, Chinese'
    remove_javascript = True
    use_embedded_content   = False
    no_stylesheets = True
    encoding               = 'UTF-8'
    conversion_options = {'linearize_tables':True} 
    masthead_url = 'http://wscdn.bbc.co.uk/zhongwen/simp/images/1024/brand.jpg'
    keep_only_tags = [
                              dict(name='h1'),
                              dict(name='p', attrs={'class':['primary-topic','summary']}),
                              dict(name='div', attrs={'class':['bodytext','datestamp']}), 
                              ]


But there's still some problem on the XML Feed page. Please look at the first photo. Look at the ??????? characters on the article summary/description lines in the XML feed page. The article itself is fine.

From what I observed, the problem only happens on XML Feed page with UTF-8 encoding.

Any idea how to solve this?
Spoiler:





Last edited by rty; 06-25-2010 at 06:56 AM.
rty is offline   Reply With Quote