Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-31-2012, 02:43 PM   #1
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
lxml.etree._utf8 crash

This is probably a question for Kovid.

I'm getting a trap in lxml.etree._utf8 with the message "ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters"

With recursions=0 and simultaneous downloads=1 this crashes ebook-convert with the following traceback

Code:
Python function terminated unexpectedly
  All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\ebooks\conversion\cli.py", line 325, in main
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 979, in run
  File "site-packages\calibre\customize\conversion.py", line 208, in __call__
  File "site-packages\calibre\ebooks\conversion\plugins\recipe_input.py", line 105, in convert
  File "site-packages\calibre\web\feeds\news.py", line 881, in download
  File "site-packages\calibre\web\feeds\news.py", line 1130, in build_index
  File "site-packages\calibre\web\feeds\news.py", line 974, in feed2index
  File "site-packages\calibre\web\feeds\templates.py", line 43, in generate
  File "site-packages\calibre\web\feeds\templates.py", line 177, in _generate
  File "site-packages\lxml\builder.py", line 222, in __call__
  File "site-packages\lxml\builder.py", line 185, in add_text
  File "lxml.etree.pyx", line 916, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:36134)
  File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:17141)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
With recursion set to 1 and simultaneous_downloads left to the default the ebook-convert application doesn't crash, but the following traceback does appear, indicating a subprpcess of the main ebook_convert process crashed

Code:
Parsing feed_1/article_4/index.html as HTML
HTML 5 parsing failed, falling back to older parsers
Traceback (most recent call last):
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 259, in parse_html
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 86, in html5_parse
  File "site-packages\html5lib\html5parser.py", line 38, in parse
  File "site-packages\html5lib\html5parser.py", line 211, in parse
  File "site-packages\html5lib\html5parser.py", line 111, in _parse
  File "site-packages\html5lib\html5parser.py", line 179, in mainLoop
  File "site-packages\html5lib\html5parser.py", line 447, in processStartTag
  File "site-packages\html5lib\html5parser.py", line 725, in startTagMeta
  File "site-packages\html5lib\treebuilders\_base.py", line 259, in insertElementNormal
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 219, in _setAttributes
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 189, in __init__
  File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:46818)
  File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:15781)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
In that case, feed_1/article_4/index.html is sitting in the debug-pipeline directories looking happy as a clam, so I'm not sure what is going on here.

I've looked at the calibre source at http://bazaar.launchpad.net/~kovid/calibre/trunk/files and the line numbers in the tracebacks don't seem to line up so I'm at a loss here.

My question: what is causing this and could calibre be made a little more bulletproof here?
nickredding is offline   Reply With Quote
Old 07-31-2012, 03:22 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Your second traceback does not indicate anything crashed just that parsing with the HTML 5 parser failed, in which case calibre fallsback to using other parsers.
kovidgoyal is offline   Reply With Quote
Old 07-31-2012, 03:44 PM   #3
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
What about the first traceback? It's the same error and traceback and ebook-convert crashes. I use recursions=0 and simultaneous_downloads=1 for recipe debugging purposes and this crash makes things very difficult.
nickredding is offline   Reply With Quote
Old 07-31-2012, 06:15 PM   #4
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
OK I ran this from source and the problem is some garbage characters in an article description. I think calibre should "fail softly" when encountering invalid character codes since recipes aren't able to control that and it happens from time to time on periodical websites--crashing isn't a good response.

Ignoring the illegal characters and issuing a warning message would be a much better response. Unfortunately this is something that should be done at the lxml level so making calibre more robust in this case is probably a task for Kovid rather than someone like me (I don't think I have the source for lxml as part of the bazaar download of calibre).

I realize that this is only an issue when calibre is running single-threaded but still--it's a limitation for people who want to debug recipes single-threaded!
nickredding is offline   Reply With Quote
Old 08-01-2012, 12:30 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Why is fixing lxml a task for me? I dont maintain lxml.You can get access to the lxml source https://launchpad.net/lxml

Though you will find that creating a parser that never fails no matter what garbage you feed it, is well-nigh impossible.
kovidgoyal is offline   Reply With Quote
Old 08-01-2012, 05:09 PM   #6
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it, but I can tell from your snarky attitude that discussing this further is pointless.
nickredding is offline   Reply With Quote
Old 08-02-2012, 12:00 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I love it when people make the assumption that because I maintain calibre I am somehow obligated to drop everything and rush off to fix whatever they think needs to be fixed. You are asking me to spend time on an issue that is important to you. Do not assume that just because you think it is important, everyone else must share your opinion.
kovidgoyal is offline   Reply With Quote
Old 08-02-2012, 12:26 AM   #8
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
If you cast your eyes back over this (very short) thread you won't find any suggestion that you should "drop everything and rush off" to fix this. I merely alerted you to the problem.

Your responses have been quite rude and inappropriate. If you don't care about an issue there is no need to get nasty about it.
nickredding is offline   Reply With Quote
Old 08-02-2012, 12:45 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Lets see. Quoting you:

"I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it"

"Unfortunately this is something that should be done at the lxml level so making calibre more robust in this case is probably a task for Kovid"
kovidgoyal is offline   Reply With Quote
Old 08-02-2012, 06:40 AM   #10
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Quote:
Originally Posted by nickredding View Post
I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it, but I can tell from your snarky attitude that discussing this further is pointless.
Quite frankly it's you that has the snarky attitude! That Kovid is able to make a living from calibre is a testament to the fact that users like his software enough to make monetary contributions. This is open source software which can be used without restrictions for FREE.

I don't think that attempting to bully the developer with side-swipe comments is a good way to enlist his help in fixing a problem in a third party component.

Perhaps you could make a non-monetary contribution by having a go at fixing the lxml problem yourself, now that Kovid has pointed you to the source.
Agama is offline   Reply With Quote
Old 08-02-2012, 08:46 AM   #11
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by kovidgoyal View Post
Lets see. Quoting you:

"I would have thought ensuring calibre is robust would be a task for you since you are making a living off of it"

"Unfortunately this is something that should be done at the lxml level so making calibre more robust in this case is probably a task for Kovid"
You are way too defensive about this.
nickredding is offline   Reply With Quote
Old 08-02-2012, 08:47 AM   #12
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by Agama View Post
Quite frankly it's you that has the snarky attitude! That Kovid is able to make a living from calibre is a testament to the fact that users like his software enough to make monetary contributions. This is open source software which can be used without restrictions for FREE.

I don't think that attempting to bully the developer with side-swipe comments is a good way to enlist his help in fixing a problem in a third party component.

Perhaps you could make a non-monetary contribution by having a go at fixing the lxml problem yourself, now that Kovid has pointed you to the source.
I've done a lot for calibre. You just don't know it because I don't trumpet it around.
nickredding is offline   Reply With Quote
Old 08-02-2012, 08:49 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by nickredding View Post
You are way too defensive about this.
Pot, meet kettle.
kovidgoyal is offline   Reply With Quote
Old 08-02-2012, 09:25 AM   #14
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by kovidgoyal View Post
Pot, meet kettle.
OK kettle, in the loop at line 160 in template.py, why not just wrap the li=LI(... and li.append(... statements in try:/except: and just put a message in the log and continue on the exception. Chances are the only reason those two statements would fail is illegal character codes--nothing to do with parsing structure so falling back to another parser wouldn't fix that.

I'm not suggesting you "rush off" and do that immediately though!
nickredding is offline   Reply With Quote
Old 08-02-2012, 09:34 AM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Because pot, illegal characters are already stripped from both the title and the text_summary in the __init__ method of Article class. So your exception isn't because of illegal characters.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
0.7.30 crash nickredding Calibre 1 11-27-2010 01:40 PM
Pseudo-crash w/V 6.39 petercreasey Calibre 12 02-11-2010 05:59 AM
calibre-0.6.31, mechanize and lxml taurnil Calibre 5 01-01-2010 07:47 AM
calibre python-lxml problem on ubuntu carpii Calibre 5 11-29-2008 05:34 AM
upgrade failed - but not python-lxml fault alexxxm Calibre 7 10-06-2008 09:36 AM


All times are GMT -4. The time now is 03:23 AM.


MobileRead.com is a privately owned, operated and funded community.