View Single Post
Old 07-14-2015, 10:04 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,438
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I dont actually see what is garbled in your output? Are you saying that the output contains HTML tags instead of normal text?

If so, you can implement preprocess_raw_html() in your recipe to fix the parsing, something like this

Code:
    def preprocess_raw_html(self, raw, url):
        from lxml import etree
        import html5lib
        root = html5lib.parse(
            clean_xml_chars(raw), treebuilder='lxml',
            namespaceHTMLElements=False)
        return etree.tostring(root, encoding=unicode)
kovidgoyal is offline   Reply With Quote