View Single Post
Old 10-26-2009, 07:35 AM   #1
gsz
Junior Member
gsz began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Oct 2009
Device: Sony PRS-505
'utf8' codec can't decode bytes error (HTML to EPUB conversion)

Hello,

I'm using Calibre 0.6.19 to convert a bunch of HTML (a CHM originally) to EPUB and I'm getting this error:

Creating EPUB Output...
Looking for large trees in ch06lev1sec8.html...
No large trees found
Looking for large trees in fm01lev1sec1.html...
No large trees found
Looking for large trees in ch14lev1sec2.html...
No large trees found
Splitting on page-break
Splitting on page-break
Python function terminated unexpectedly
('utf8', '/*/*[2]/P\xf7/\x04:h2', 9, 13, 'invalid data') (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 90, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 19, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 827, in run
File "site-packages\calibre\ebooks\epub\output.py", line 162, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 56, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 66, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 175, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 215, in split_on_page_breaks
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 283, in do_split
File "lxml.etree.pyx", line 1621, in lxml.etree._ElementTree.getpath (src/lxml/lxml.etree.c:17041)
File "apihelpers.pxi", line 1130, in lxml.etree.funicode (src/lxml/lxml.etree.c:36925)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-12: invalid data

Any idea how I can fix this? I can't even figure out where to look for this invalid data. Would it be somewhere in ch14lev1sec2.html, since that's the last file listed? The header in that file says it's 8859-1, not utf-8...
gsz is offline   Reply With Quote