Hi,
I am writing a recipe for a web page that includes an error which completely confuses beautiful soup. When I use the convenience function index_to_soup I can generate beautiful soup from an html file. However, I would have to use the markupMassage feature of beautiful soup to remove some errors from the html file before converting it into beautiful soup:
http://www.crummy.com/software/Beaut...mentation.html
Are there any parameters or other mechanisms to pass the markupMassage list to index_to_soup? I have notices that the function does something with the makupMassage Parameter when generating the beautiful soup:
Code:
massage = list(BeautifulSoup.MARKUP_MASSAGE)
enc = 'cp1252' if callable(self.encoding) or self.encoding is None else self.encoding
massage.append((re.compile(r'&(\S+?);'), lambda match:
entity_to_unicode(match, encoding=enc)))
return BeautifulSoup(_raw, markupMassage=massage)
So there must be some way of passing my personal markupMassage list to index_to_soup?
Thanks,
Jens