MobileRead Forums - View Single Post - Create eBook from MDN & FB Developer sites via Calibre

kovidgoyal · 07-14-2015, 10:04 PM

I dont actually see what is garbled in your output? Are you saying that the output contains HTML tags instead of normal text?

If so, you can implement preprocess_raw_html() in your recipe to fix the parsing, something like this

Code:

    def preprocess_raw_html(self, raw, url):
        from lxml import etree
        import html5lib
        root = html5lib.parse(
            clean_xml_chars(raw), treebuilder='lxml',
            namespaceHTMLElements=False)
        return etree.tostring(root, encoding=unicode)

07-14-2015, 10:04 PM	#3
kovidgoyal creator of calibre Posts: 45,438 Karma: 27757438 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I dont actually see what is garbled in your output? Are you saying that the output contains HTML tags instead of normal text? If so, you can implement preprocess_raw_html() in your recipe to fix the parsing, something like this Code: def preprocess_raw_html(self, raw, url): from lxml import etree import html5lib root = html5lib.parse( clean_xml_chars(raw), treebuilder='lxml', namespaceHTMLElements=False) return etree.tostring(root, encoding=unicode)