India Today Magazine articles have long dashes between words which show up as
—
adding encoding utf-8 didn't work. So I added..
Code:
def preprocess_raw_html(self, raw_html, url):
return raw_html.replace('—', '--')
Is there another way to do this?
maybe add the code to the recipe!
also this bold part
Code:
extra_css = '''
#sub-d {font-style:italic; color:#202020;}
.story__byline {font-size:small; text-align:left;}
.body_caption, .mos__alt, .caption, .caption-drupal-entity {font-size:small; text-align:center;}
blockquote{color:#404040;}
'''