View Single Post
Old 01-29-2017, 09:28 PM   #3
edwardwong
Junior Member
edwardwong began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2017
Device: Kindle Paperwhite 2016
Hi,

I just updated the recipe to include RSS feeds for the print edition as well as the images fix. Just replace the portions below.

Some articles are showing up blank (Header, but no body) eg. the "Opinion" section . Still trying to figure that one out.


Code:
    feeds = [
               (u'Top of the News' , u'http://www.straitstimes.com/print/top-of-the-news/rss.xml' )
              ,(u'World'           , u'http://www.straitstimes.com/print/world/rss.xml'       )
              ,(u'Home'            , u'http://www.straitstimes.com/print/home/rss.xml'     )
              ,(u'Business'        , u'http://www.straitstimes.com/print/business/rss.xml'     )
              ,(u'Life'            , u'http://www.straitstimes.com/print/life/rss.xml'     )
              ,(u'Science'         , u'http://www.straitstimes.com/print/science/rss.xml' )
              ,(u'Digital'         , u'http://www.straitstimes.com/print/digital/rss.xml'     )
              ,(u'Insight'         , u'http://www.straitstimes.com/print/insight/rss.xml'     )
              ,(u'Opinion'         , u'http://www.straitstimes.com/print/opinion/rss.xml'     )
              ,(u'Forum'           , u'http://www.straitstimes.com/print/forum/rss.xml' )
              ,(u'Big Picture'     , u'http://www.straitstimes.com/print/big-picture/rss.xml' )
              ,(u'Community'       , u'http://www.straitstimes.com/print/community/rss.xml' )
              ,(u'Education'       , u'http://www.straitstimes.com/print/education/rss.xml' )
]

    def preprocess_html(self, soup):
        for img in soup.findAll('img', srcset=True):
            img['src'] = img['srcset'].partition(' ')[0]
            img['srcset'] = ''
        return soup
edwardwong is offline   Reply With Quote