View Single Post
Old Yesterday, 03:42 AM   #1780
VapidRapidReader
Junior Member
VapidRapidReader began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2010
Device: Boox Leaf
From another thread:
Quote:
Originally Posted by kovidgoyal View Post
RECOVER_PARSER is gone because of bugs in lxml in windows, https://bugs.launchpad.net/lxml/+bug/2125756. I dont know why any plugins would have been using it, the correct way to parse html is to use the parse_html function from calibre.oeb.parse_utils. But if plugins want to parse html or xml using lxml directly, the relevant functions are safe_xml_fromstring and safe_html_fromstring from the calibre.utils.xml_parse module. And if they really, really want to use RECOVER_PARSER then can simply define it themselves as

Code:
from lxml import etree
RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True, resolve_entities=False)
Note that I strongly recommend against using RECOVER_PARSER as it is fundamentally broken thanks to the bug in lxml linked to above.
VapidRapidReader is offline   Reply With Quote