View Single Post
Old Yesterday, 05:32 AM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,415
Karma: 150231975
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by kovidgoyal View Post
RECOVER_PARSER is gone because of bugs in lxml in windows, https://bugs.launchpad.net/lxml/+bug/2125756. I dont know why any plugins would have been using it, the correct way to parse html is to use the parse_html function from calibre.oeb.parse_utils. But if plugins want to parse html or xml using lxml directly, the relevant functions are safe_xml_fromstring and safe_html_fromstring from the calibre.utils.xml_parse module. And if they really, really want to use RECOVER_PARSER then can simply define it themselves as

Code:
from lxml import etree
RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True, resolve_entities=False)
Note that I strongly recommend against using RECOVER_PARSER as it is fundamentally broken thanks to the bug in lxml linked to above.
Thank you very much. The fix is very easy as you've said. I've been updating some of the plugins that use RECOVER_PARSER and posting them for others.
JSWolf is offline   Reply With Quote