MobileRead Forums - View Single Post - 8.12 breaks from calibre.ebooks.oeb.parse_utils import RECOVER_PARSER

Terisa de morgan · 10-02-2025, 04:53 PM

Quote:

Originally Posted by kovidgoyal

RECOVER_PARSER is gone because of bugs in lxml in windows, https://bugs.launchpad.net/lxml/+bug/2125756. I dont know why any plugins would have been using it, the correct way to parse html is to use the parse_html function from calibre.oeb.parse_utils. But if plugins want to parse html or xml using lxml directly, the relevant functions are safe_xml_fromstring and safe_html_fromstring from the calibre.utils.xml_parse module. And if they really, really want to use RECOVER_PARSER then can simply define it themselves as

Code:

from lxml import etree
RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True, resolve_entities=False)

Note that I strongly recommend against using RECOVER_PARSER as it is fundamentally broken thanks to the bug in lxml linked to above.

Checking some plugin, RECOVER_PARSER is not used for parsing html (it uses parse_html in that case) but for parsing xml. Is there any calibre function for that?