View Single Post
Old 07-31-2012, 02:43 PM   #1
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
lxml.etree._utf8 crash

This is probably a question for Kovid.

I'm getting a trap in lxml.etree._utf8 with the message "ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters"

With recursions=0 and simultaneous downloads=1 this crashes ebook-convert with the following traceback

Code:
Python function terminated unexpectedly
  All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\ebooks\conversion\cli.py", line 325, in main
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 979, in run
  File "site-packages\calibre\customize\conversion.py", line 208, in __call__
  File "site-packages\calibre\ebooks\conversion\plugins\recipe_input.py", line 105, in convert
  File "site-packages\calibre\web\feeds\news.py", line 881, in download
  File "site-packages\calibre\web\feeds\news.py", line 1130, in build_index
  File "site-packages\calibre\web\feeds\news.py", line 974, in feed2index
  File "site-packages\calibre\web\feeds\templates.py", line 43, in generate
  File "site-packages\calibre\web\feeds\templates.py", line 177, in _generate
  File "site-packages\lxml\builder.py", line 222, in __call__
  File "site-packages\lxml\builder.py", line 185, in add_text
  File "lxml.etree.pyx", line 916, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:36134)
  File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:17141)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
With recursion set to 1 and simultaneous_downloads left to the default the ebook-convert application doesn't crash, but the following traceback does appear, indicating a subprpcess of the main ebook_convert process crashed

Code:
Parsing feed_1/article_4/index.html as HTML
HTML 5 parsing failed, falling back to older parsers
Traceback (most recent call last):
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 259, in parse_html
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 86, in html5_parse
  File "site-packages\html5lib\html5parser.py", line 38, in parse
  File "site-packages\html5lib\html5parser.py", line 211, in parse
  File "site-packages\html5lib\html5parser.py", line 111, in _parse
  File "site-packages\html5lib\html5parser.py", line 179, in mainLoop
  File "site-packages\html5lib\html5parser.py", line 447, in processStartTag
  File "site-packages\html5lib\html5parser.py", line 725, in startTagMeta
  File "site-packages\html5lib\treebuilders\_base.py", line 259, in insertElementNormal
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 219, in _setAttributes
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 189, in __init__
  File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:46818)
  File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:15781)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
In that case, feed_1/article_4/index.html is sitting in the debug-pipeline directories looking happy as a clam, so I'm not sure what is going on here.

I've looked at the calibre source at http://bazaar.launchpad.net/~kovid/calibre/trunk/files and the line numbers in the tracebacks don't seem to line up so I'm at a loss here.

My question: what is causing this and could calibre be made a little more bulletproof here?
nickredding is offline   Reply With Quote