View Single Post
Old 10-26-2009, 02:20 PM   #3
gsz
Junior Member
gsz began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Oct 2009
Device: Sony PRS-505
Hello Kovid,

I appreciate your response, and all the work you put into this application.

I did a couple things to determine if ch14lev1sec2.html was the culprit. I converted it to ASCII and compared with the original (no difference between the converted file and the original), I did the same with the two css files it refers to (no difference). So I'm pretty sure this file is ISO-8851-1 (in fact, it's ASCII).

I then replaced it with an empty html file (nothing but an empty head and body), and sure enough the problem occurred elsewhere:

Creating EPUB Output...
Looking for large trees in ch06lev1sec8.html...
No large trees found
Looking for large trees in fm01lev1sec1.html...
No large trees found
Looking for large trees in ch14lev1sec2.html...
No large trees found
Looking for large trees in ch11lev1sec8.html...
No large trees found
Looking for large trees in ch04lev1sec4.html...
No large trees found
Looking for large trees in app03lev1sec3.html...
No large trees found
Splitting on page-break
Looking for large trees in F.html...
No large trees found
Split into 2 parts
Splitting on page-break
Splitting on page-break
Python function terminated unexpectedly
('utf8', '/*/*[2]/\x18\x81\x8d\x03:h2', 9, 10, 'unexpected code byte') (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 90, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 19, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 827, in run
File "site-packages\calibre\ebooks\epub\output.py", line 162, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 56, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 66, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 175, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 215, in split_on_page_breaks
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 283, in do_split
File "lxml.etree.pyx", line 1621, in lxml.etree._ElementTree.getpath (src/lxml/lxml.etree.c:17041)
File "apihelpers.pxi", line 1130, in lxml.etree.funicode (src/lxml/lxml.etree.c:36925)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 9: unexpected code byte

Then I thought I will do a "binary search" on the file: removed the first half and ran conversion, then removed the second half and ran conversion. However this strategy failed, because the conversion failed with the above error in both of these cases (that is, it failed at processing a different file, F.html).

At this point I put back the original HTML and rerun conversion which failed again at the above location, that is, at F.html!

Then I deleted all the HTML files, reextracted them from the CHM and ran conversion again without changing anything, just to see what happens and now it failed at yet another location:

Creating EPUB Output...
Looking for large trees in ch06lev1sec8.html...
No large trees found
Looking for large trees in fm01lev1sec1.html...
No large trees found
Looking for large trees in fm01lev1sec7.html...
No large trees found
Looking for large trees in ch11lev1sec8.html...
No large trees found
Looking for large trees in ch07lev1sec2.html...
No large trees found
Splitting on page-break
Splitting on page-break
Python function terminated unexpectedly
('utf8', '/*/*[2]/\x90\x1fl\x04:h2', 8, 9, 'unexpected code byte') (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 90, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 19, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 827, in run
File "site-packages\calibre\ebooks\epub\output.py", line 162, in convert
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 56, in __call__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 66, in split_item
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 175, in __init__
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 215, in split_on_page_breaks
File "site-packages\calibre\ebooks\oeb\transforms\split.py", line 283, in do_split
File "lxml.etree.pyx", line 1621, in lxml.etree._ElementTree.getpath (src/lxml/lxml.etree.c:17041)
File "apihelpers.pxi", line 1130, in lxml.etree.funicode (src/lxml/lxml.etree.c:36925)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x90 in position 8: unexpected code byte

I can see it hadn't processed F.html or ch14lev1sec2.html which suggests that the order of processing these files is somewhat random, or it may depend on the order in which they appear in the directory on the file system (=random), which makes troubleshooting or trying to pinpoint the error a bit more complicated.

So I thought maybe I ask if you have any suggestion what I should try next?
gsz is offline   Reply With Quote