MobileRead Forums - View Single Post - Recipe turning some punctuation marks as non-printable characters

knowledgecrawler · 08-20-2014, 01:30 PM

Hi

I am trying to make a recipe for downloading news from govt site i.e. pib.nic.in
so was trying to do some tweaking aroung and bumped into this problem..

When ebook-convert downloads the news, it turns some punctuation into junk characters..

Here is the code

Spoiler:

Original HTML had

Spoiler:

after running the recipe

Spoiler:

How can i fix it?

08-20-2014, 01:30 PM	#1
knowledgecrawler Member Posts: 12 Karma: 10 Join Date: Aug 2014 Device: kindle	Recipe turning some punctuation marks as non-printable characters Hi I am trying to make a recipe for downloading news from govt site i.e. pib.nic.in so was trying to do some tweaking aroung and bumped into this problem.. When ebook-convert downloads the news, it turns some punctuation into junk characters.. Here is the code Spoiler: from __future__ import with_statement __license__ = 'GPL 3' __copyright__ = '2014, Amit <amitkp.ias@gmail.com>' from calibre.web.feeds.news import BasicNewsRecipe class My_Feeds(BasicNewsRecipe): title = 'PIB Daily' language = 'en_IN' oldest_article = 1.2 __author__ = 'Amit' max_articles_per_feed = 100 no_stylesheets = True remove_javascript = True center_navbar = True use_embedded_content = False remove_empty_feeds = True keep_only_tags = [ dict(id=['ministry']), dict(attrs={'class':['contentdiv']}) ] def preprocess_raw_html(self, raw, url): return raw.replace('lang=EN-US', 'lang="en_US"').replace('lang=EN-IN', 'lang="en_IN"') def parse_index(self): feeds = [] current_section = 'Section' current_articles = [] current_articles.append({'url':'http://pib.nic.in/newsite/efeatures.aspx?relid=108697', 'title':'Climate Change Issues Need Better Attention', 'date': '', 'description':''}) feeds.append((current_section, current_articles)) return feeds Original HTML had Spoiler: countries to “walk the Talk” in this regard after running the recipe Spoiler: countries to â€œwalk the Talkâ€� in this regard How can i fix it?