MobileRead Forums - View Single Post

avggeek · 07-03-2011, 04:42 AM

Hello,

It's possible that the source file I'm using is borked, but since I'm terrible at regex I figured it would be best to ask if there is some other problem here.

When I try to convert a CHM file, I get a single page EPUB with some junk characters. This is the output log:

Code:

Opening CHM file
Extracting CHM to c:\users\avggeek\appdata\local\temp\calibre_0.8.5_tmp_zyu6tp\calibre_0.8.5_knfnqa_chm2oeb
Found 0 section nodes
Language not specified
Title not specified
Building file list...
Found files...
HTMLFile:0:a:c:\users\avggeek\appdata\local\temp\calibre_0.8.5_tmp_zyu6tp\calibre_0.8.5_knfnqa_chm2oeb\001.html
Normalizing filename cases
Rewriting HTML links
Parsing 001.html ...
Initial parse failed:
Traceback (most recent call last):
File "site-packages\calibre\ebooks\oeb\base.py", line 886, in first_pass
File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
File "parser.pxi", line 1445, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78709)
File "parser.pxi", line 920, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:75083)
File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955)
XMLSyntaxError: input conversion failed due to input error, bytes 0xBB 0xDB 0x0E 0x00

Parsing file '001.html' as HTML
File '001.html' does not appear to be (X)HTML
File '001.html' appears to be a HTML fragment
Forcing 001.html into XHTML namespace
File '001.html' missing <head/> element
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 1 items of level: p_1
Ignoring level p_1
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Looking for large trees in 001.html...
No large trees found
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to c:\users\avggeek\appdata\local\temp\calibre_0.8.5_tmp_zyu6tp\calibre_0.8.5_xvuuek.epub

This is on calibre 0.8.5, running under Windows 7 64-bit.