Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-22-2014, 08:17 AM   #1
issproevolution
Junior Member
issproevolution began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
Random "HTML 5 parsing failed"

Hi all!
in first, sorry for my poor english..
I would like to improve one recipes from Calibre for add some information and image to books..
I have one bigger problem: sometimes Calibre doesn't work and it return me this for only one (or two) article:

Code:
HTML 5 parsing failed, falling back to older parsers
Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 277, in parse_html
    data = html5_parse(data)
  File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 98, in html5_parse
    data = html5lib.parse(clean_xml_chars(data), treebuilder='lxml').getroot()
  File "/usr/lib/calibre/html5lib/html5parser.py", line 27, in parse
    return p.parse(doc, encoding=encoding)
  File "/usr/lib/calibre/html5lib/html5parser.py", line 227, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "/usr/lib/calibre/html5lib/html5parser.py", line 96, in _parse
    self.mainLoop()
  File "/usr/lib/calibre/html5lib/html5parser.py", line 162, in mainLoop
    currentNodeName = currentNode.name if currentNode is not None else None
  File "/usr/lib/calibre/html5lib/treebuilders/etree_lxml.py", line 226, in _getName
    return infosetFilter.fromXmlName(self._name)
  File "/usr/lib/calibre/html5lib/ihatexml.py", line 276, in fromXmlName
    name = name.replace(item, self.unescapeChar(item))
  File "/usr/lib/calibre/html5lib/ihatexml.py", line 285, in unescapeChar
    return chr(int(charcode[1:], 16))
ValueError: chr() arg not in range(256)
but it happen randomly!
if I restart recepies, it'll happen to another article! it's strange, isn't it?

my another goal is add image to book, but it doesn't appears

thank you so much for support!!
best regards




---

added info:
- rss: http://www.ilfattoquotidiano.it/cate...-palazzo/feed/
- code:
Code:
from calibre.web.feeds.news import BasicNewsRecipe

class IlFattoQuotidianoDiISP(BasicNewsRecipe):
    title          = u'Il fatto quotidiano ISP'
    oldest_article = 2
    max_articles_per_feed = 5
    auto_cleanup = True
    language = 'it'
    __author__ = 'isspro'
    encoding = 'utf8'

    no_stylesheets = True
    use_embedded_content = False
    remove_javascript  = True
    auto_cleanup = False
    
    keep_only_tags     = [dict(name='div', attrs={'class':'post-content-container'}),
    					  dict(name='div', attrs={'id':'meta-bar'})
    					  
    ]
    
    remove_tags = [
    				dict(name='div', attrs={'id':'commenti'}),
    				dict(name='div', attrs={'class':'post-tags'})
    ]
    
    extra_css = '''
    		h1 {font-size:x-large;}
    		h2 {font-size:medium;}
    		post-tags {font-size:xx-small;}
    		img {display:block;}
    '''
 
    feeds          = [(u'Politica & Palazzo', u'http://www.ilfattoquotidiano.it/category/politica-palazzo/feed/')]
issproevolution is offline   Reply With Quote
Old 01-22-2014, 10:11 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,260
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That error just means something in the markup is causing hre html parser to fail, calibre will automatically try a different parser.
kovidgoyal is offline   Reply With Quote
Advert
Old 01-23-2014, 02:32 AM   #3
issproevolution
Junior Member
issproevolution began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
So, I can't do anything because HTML has an error inside, right?
because of it, an article every five is full of "?" and square :-D

thank you so much!!
issproevolution is offline   Reply With Quote
Old 01-23-2014, 02:34 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,260
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That will have to do with encoding, set the encoding parameter in the recipe to whatever character encoding the site uses.
kovidgoyal is offline   Reply With Quote
Old 01-23-2014, 03:33 AM   #5
issproevolution
Junior Member
issproevolution began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
I see! and I set "utf8" because I read in HTML site:
Code:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
I stopped conversion at half and I took a look at the temp files: I found two html saved in wrong way..
I guess there is a problem with "who knows" character..

I'm very sad.. but is the best I could do
thank you very much and thank you for your job! ;-)
issproevolution is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Pick Random Book" - not so random?? Chris_Snow Library Management 3 09-15-2013 06:44 PM
Adding "Pick a Random Book" in Sharing over the net ippopom Recipes 2 01-13-2013 04:32 AM
PRS-650 Anyone knows how to fix the random "protected by DRM" message? nekron Sony Reader Dev Corner 1 01-19-2011 08:23 AM
"No Books" in Random Collections on PRS-300 mockidol Calibre 7 09-18-2009 08:05 AM
Seriously thoughtful Random House: "Guter Start für eBooks" netseeker Lounge 1 06-16-2009 04:01 PM


All times are GMT -4. The time now is 04:32 AM.


MobileRead.com is a privately owned, operated and funded community.