![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
Random "HTML 5 parsing failed"
Hi all!
in first, sorry for my poor english.. I would like to improve one recipes from Calibre for add some information and image to books.. I have one bigger problem: sometimes Calibre doesn't work and it return me this for only one (or two) article: Code:
HTML 5 parsing failed, falling back to older parsers Traceback (most recent call last): File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 277, in parse_html data = html5_parse(data) File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 98, in html5_parse data = html5lib.parse(clean_xml_chars(data), treebuilder='lxml').getroot() File "/usr/lib/calibre/html5lib/html5parser.py", line 27, in parse return p.parse(doc, encoding=encoding) File "/usr/lib/calibre/html5lib/html5parser.py", line 227, in parse parseMeta=parseMeta, useChardet=useChardet) File "/usr/lib/calibre/html5lib/html5parser.py", line 96, in _parse self.mainLoop() File "/usr/lib/calibre/html5lib/html5parser.py", line 162, in mainLoop currentNodeName = currentNode.name if currentNode is not None else None File "/usr/lib/calibre/html5lib/treebuilders/etree_lxml.py", line 226, in _getName return infosetFilter.fromXmlName(self._name) File "/usr/lib/calibre/html5lib/ihatexml.py", line 276, in fromXmlName name = name.replace(item, self.unescapeChar(item)) File "/usr/lib/calibre/html5lib/ihatexml.py", line 285, in unescapeChar return chr(int(charcode[1:], 16)) ValueError: chr() arg not in range(256) if I restart recepies, it'll happen to another article! it's strange, isn't it? my another goal is add image to book, but it doesn't appears ![]() thank you so much for support!! best regards --- added info: - rss: http://www.ilfattoquotidiano.it/cate...-palazzo/feed/ - code: Code:
from calibre.web.feeds.news import BasicNewsRecipe class IlFattoQuotidianoDiISP(BasicNewsRecipe): title = u'Il fatto quotidiano ISP' oldest_article = 2 max_articles_per_feed = 5 auto_cleanup = True language = 'it' __author__ = 'isspro' encoding = 'utf8' no_stylesheets = True use_embedded_content = False remove_javascript = True auto_cleanup = False keep_only_tags = [dict(name='div', attrs={'class':'post-content-container'}), dict(name='div', attrs={'id':'meta-bar'}) ] remove_tags = [ dict(name='div', attrs={'id':'commenti'}), dict(name='div', attrs={'class':'post-tags'}) ] extra_css = ''' h1 {font-size:x-large;} h2 {font-size:medium;} post-tags {font-size:xx-small;} img {display:block;} ''' feeds = [(u'Politica & Palazzo', u'http://www.ilfattoquotidiano.it/category/politica-palazzo/feed/')] |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,260
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That error just means something in the markup is causing hre html parser to fail, calibre will automatically try a different parser.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
So, I can't do anything because HTML has an error inside, right?
because of it, an article every five is full of "?" and square :-D thank you so much!! |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,260
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will have to do with encoding, set the encoding parameter in the recipe to whatever character encoding the site uses.
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
I see! and I set "utf8" because I read in HTML site:
Code:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" /> I guess there is a problem with "who knows" character.. I'm very sad.. but is the best I could do thank you very much and thank you for your job! ;-) |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Pick Random Book" - not so random?? | Chris_Snow | Library Management | 3 | 09-15-2013 06:44 PM |
Adding "Pick a Random Book" in Sharing over the net | ippopom | Recipes | 2 | 01-13-2013 04:32 AM |
PRS-650 Anyone knows how to fix the random "protected by DRM" message? | nekron | Sony Reader Dev Corner | 1 | 01-19-2011 08:23 AM |
"No Books" in Random Collections on PRS-300 | mockidol | Calibre | 7 | 09-18-2009 08:05 AM |
Seriously thoughtful Random House: "Guter Start für eBooks" | netseeker | Lounge | 1 | 06-16-2009 04:01 PM |