![]() |
#1 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
lxml.etree._utf8 crash
This is probably a question for Kovid.
I'm getting a trap in lxml.etree._utf8 with the message "ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters" With recursions=0 and simultaneous downloads=1 this crashes ebook-convert with the following traceback Code:
Python function terminated unexpectedly All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\ebooks\conversion\cli.py", line 325, in main File "site-packages\calibre\ebooks\conversion\plumber.py", line 979, in run File "site-packages\calibre\customize\conversion.py", line 208, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_input.py", line 105, in convert File "site-packages\calibre\web\feeds\news.py", line 881, in download File "site-packages\calibre\web\feeds\news.py", line 1130, in build_index File "site-packages\calibre\web\feeds\news.py", line 974, in feed2index File "site-packages\calibre\web\feeds\templates.py", line 43, in generate File "site-packages\calibre\web\feeds\templates.py", line 177, in _generate File "site-packages\lxml\builder.py", line 222, in __call__ File "site-packages\lxml\builder.py", line 185, in add_text File "lxml.etree.pyx", line 916, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:36134) File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:17141) File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters Code:
Parsing feed_1/article_4/index.html as HTML HTML 5 parsing failed, falling back to older parsers Traceback (most recent call last): File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 259, in parse_html File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 86, in html5_parse File "site-packages\html5lib\html5parser.py", line 38, in parse File "site-packages\html5lib\html5parser.py", line 211, in parse File "site-packages\html5lib\html5parser.py", line 111, in _parse File "site-packages\html5lib\html5parser.py", line 179, in mainLoop File "site-packages\html5lib\html5parser.py", line 447, in processStartTag File "site-packages\html5lib\html5parser.py", line 725, in startTagMeta File "site-packages\html5lib\treebuilders\_base.py", line 259, in insertElementNormal File "site-packages\html5lib\treebuilders\etree_lxml.py", line 219, in _setAttributes File "site-packages\html5lib\treebuilders\etree_lxml.py", line 189, in __init__ File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:46818) File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:15781) File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters I've looked at the calibre source at http://bazaar.launchpad.net/~kovid/calibre/trunk/files and the line numbers in the tracebacks don't seem to line up so I'm at a loss here. My question: what is causing this and could calibre be made a little more bulletproof here? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Your second traceback does not indicate anything crashed just that parsing with the HTML 5 parser failed, in which case calibre fallsback to using other parsers.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
What about the first traceback? It's the same error and traceback and ebook-convert crashes. I use recursions=0 and simultaneous_downloads=1 for recipe debugging purposes and this crash makes things very difficult.
|
![]() |
![]() |
![]() |
#4 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
OK I ran this from source and the problem is some garbage characters in an article description. I think calibre should "fail softly" when encountering invalid character codes since recipes aren't able to control that and it happens from time to time on periodical websites--crashing isn't a good response.
Ignoring the illegal characters and issuing a warning message would be a much better response. Unfortunately this is something that should be done at the lxml level so making calibre more robust in this case is probably a task for Kovid rather than someone like me (I don't think I have the source for lxml as part of the bazaar download of calibre). I realize that this is only an issue when calibre is running single-threaded but still--it's a limitation for people who want to debug recipes single-threaded! |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Why is fixing lxml a task for me? I dont maintain lxml.You can get access to the lxml source https://launchpad.net/lxml
Though you will find that creating a parser that never fails no matter what garbage you feed it, is well-nigh impossible. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it, but I can tell from your snarky attitude that discussing this further is pointless.
|
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
![]() |
![]() |
![]() |
![]() |
#8 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
If you cast your eyes back over this (very short) thread you won't find any suggestion that you should "drop everything and rush off" to fix this. I merely alerted you to the problem.
Your responses have been quite rude and inappropriate. If you don't care about an issue there is no need to get nasty about it. |
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Lets see. Quoting you:
"I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it" "Unfortunately this is something that should be done at the lxml level so making calibre more robust in this case is probably a task for Kovid" |
![]() |
![]() |
![]() |
#10 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
|
Quote:
I don't think that attempting to bully the developer with side-swipe comments is a good way to enlist his help in fixing a problem in a third party component. Perhaps you could make a non-monetary contribution by having a go at fixing the lxml problem yourself, now that Kovid has pointed you to the source. |
|
![]() |
![]() |
![]() |
#11 | |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
|
|
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#14 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
OK kettle, in the loop at line 160 in template.py, why not just wrap the li=LI(... and li.append(... statements in try:/except: and just put a message in the log and continue on the exception. Chances are the only reason those two statements would fail is illegal character codes--nothing to do with parsing structure so falling back to another parser wouldn't fix that.
I'm not suggesting you "rush off" and do that immediately though! |
![]() |
![]() |
![]() |
#15 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Because pot, illegal characters are already stripped from both the title and the text_summary in the __init__ method of Article class. So your exception isn't because of illegal characters.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
0.7.30 crash | nickredding | Calibre | 1 | 11-27-2010 01:40 PM |
Pseudo-crash w/V 6.39 | petercreasey | Calibre | 12 | 02-11-2010 05:59 AM |
calibre-0.6.31, mechanize and lxml | taurnil | Calibre | 5 | 01-01-2010 07:47 AM |
calibre python-lxml problem on ubuntu | carpii | Calibre | 5 | 11-29-2008 05:34 AM |
upgrade failed - but not python-lxml fault | alexxxm | Calibre | 7 | 10-06-2008 09:36 AM |