View Single Post
Old 05-21-2018, 09:07 AM   #1
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
NYTimes - unclosed comment tag

Hi Kovid -
The NYTimes recipe has a collection of --> (sometimes twice) leftover in many articles, which seems to be an unmatched comment tag. I've tried replacing in preprocess and postprocess_html but can't seem to figure out how to do so. Any ideas the best way to remove this from articles?
Code:
    def postprocess_html(self, soup, first_fetch):
        findcomment = soup.findAll(text = re.compile('--gt&;'))
        for comment in findcomment:
            fixed_text = unicode(comment).replace('--gt&;', '')
            comment.replace_with(fixed_text)
        return soup
Doesn't seem to work and returns articles with nothing but the -->

Last edited by bobbysteel; 05-21-2018 at 09:15 AM.
bobbysteel is offline   Reply With Quote