|
|
#1 |
|
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
Random "HTML 5 parsing failed"
Hi all!
in first, sorry for my poor english.. I would like to improve one recipes from Calibre for add some information and image to books.. I have one bigger problem: sometimes Calibre doesn't work and it return me this for only one (or two) article: Code:
HTML 5 parsing failed, falling back to older parsers
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 277, in parse_html
data = html5_parse(data)
File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 98, in html5_parse
data = html5lib.parse(clean_xml_chars(data), treebuilder='lxml').getroot()
File "/usr/lib/calibre/html5lib/html5parser.py", line 27, in parse
return p.parse(doc, encoding=encoding)
File "/usr/lib/calibre/html5lib/html5parser.py", line 227, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/lib/calibre/html5lib/html5parser.py", line 96, in _parse
self.mainLoop()
File "/usr/lib/calibre/html5lib/html5parser.py", line 162, in mainLoop
currentNodeName = currentNode.name if currentNode is not None else None
File "/usr/lib/calibre/html5lib/treebuilders/etree_lxml.py", line 226, in _getName
return infosetFilter.fromXmlName(self._name)
File "/usr/lib/calibre/html5lib/ihatexml.py", line 276, in fromXmlName
name = name.replace(item, self.unescapeChar(item))
File "/usr/lib/calibre/html5lib/ihatexml.py", line 285, in unescapeChar
return chr(int(charcode[1:], 16))
ValueError: chr() arg not in range(256)
if I restart recepies, it'll happen to another article! it's strange, isn't it? my another goal is add image to book, but it doesn't appears ![]() thank you so much for support!! best regards --- added info: - rss: http://www.ilfattoquotidiano.it/cate...-palazzo/feed/ - code: Code:
from calibre.web.feeds.news import BasicNewsRecipe
class IlFattoQuotidianoDiISP(BasicNewsRecipe):
title = u'Il fatto quotidiano ISP'
oldest_article = 2
max_articles_per_feed = 5
auto_cleanup = True
language = 'it'
__author__ = 'isspro'
encoding = 'utf8'
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
auto_cleanup = False
keep_only_tags = [dict(name='div', attrs={'class':'post-content-container'}),
dict(name='div', attrs={'id':'meta-bar'})
]
remove_tags = [
dict(name='div', attrs={'id':'commenti'}),
dict(name='div', attrs={'class':'post-tags'})
]
extra_css = '''
h1 {font-size:x-large;}
h2 {font-size:medium;}
post-tags {font-size:xx-small;}
img {display:block;}
'''
feeds = [(u'Politica & Palazzo', u'http://www.ilfattoquotidiano.it/category/politica-palazzo/feed/')]
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That error just means something in the markup is causing hre html parser to fail, calibre will automatically try a different parser.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
So, I can't do anything because HTML has an error inside, right?
because of it, an article every five is full of "?" and square :-D thank you so much!! |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will have to do with encoding, set the encoding parameter in the recipe to whatever character encoding the site uses.
|
|
|
|
|
|
#5 |
|
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2014
Device: Kindle
|
I see! and I set "utf8" because I read in HTML site:
Code:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" /> I guess there is a problem with "who knows" character.. I'm very sad.. but is the best I could do thank you very much and thank you for your job! ;-) |
|
|
|
| Advert | |
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| "Pick Random Book" - not so random?? | Chris_Snow | Library Management | 3 | 09-15-2013 07:44 PM |
| Adding "Pick a Random Book" in Sharing over the net | ippopom | Recipes | 2 | 01-13-2013 05:32 AM |
| PRS-650 Anyone knows how to fix the random "protected by DRM" message? | nekron | Sony Reader Dev Corner | 1 | 01-19-2011 09:23 AM |
| "No Books" in Random Collections on PRS-300 | mockidol | Calibre | 7 | 09-18-2009 09:05 AM |
| Seriously thoughtful Random House: "Guter Start für eBooks" | netseeker | Lounge | 1 | 06-16-2009 05:01 PM |