View Full Version : Perfect CHM to epub?


Artha
07-17-2011, 07:10 AM
CHM is a compiled HTML. epub is a zipped collection of HTML with a few extra files. Now is it possible to have a perfect conversion from CHM to epub? Or are there some problems?

Toxaris
07-17-2011, 12:43 PM
It will never be perfect automatically. I believe Calibre does a reasonable job. It all depends on the source of course. Tables are always cumbersome.

Artha
07-21-2011, 03:10 PM
So CHM isn't exactly HTML?

Jim Lester
07-21-2011, 07:59 PM
ePub contains XHTML not HTML, so if the HTML in your CHM is not well formed there will be problems.

Toxaris
07-22-2011, 03:19 AM
And don't forget, CHM is not ePUB. Some formatting which is possible with CHM is not so doable in ePUB. Large tables is a known problem in ePUB.

pooja r
07-27-2011, 02:25 PM
CHM is a compiled HTML. epub is a zipped collection of HTML with a few extra files. Now is it possible to have a perfect conversion from CHM to epub? Or are there some problems?

hi artha,
I have problem converting chm to epub using calibre,i get only preface contributors,cover of the book but not the chapters,can u help me in this matter

Artha
07-31-2011, 04:16 AM
Pooja: this is what I am also looking for.

Toxaris
07-31-2011, 07:41 AM
What you can do is transform the CHM back to HTML and create the ePUB by hand.

Artha
08-05-2011, 12:46 PM
What you can do is transform the CHM back to HTML and create the ePUB by hand.

Good point.

R&W
08-10-2011, 06:02 PM
What you can do is transform the CHM back to HTML and create the ePUB by hand.

OK, but how to do this, what tools to use?

I try to do this by extracting files from chm, than take "page.htm" file (small one) and put it in Calibre and convert it to ePUB. As a result I get a relatively large file (12mb) with many-many blank pages (except preface and authors). But this result was better than from original .CHM file, when I get only 3 pages ePUB.

When debugging - I find one interesting thing: when converting .chm -"processed" (maybe also "parsed" and "structure") folder was very small comparing with "input" folder so I get 3 pages epub, while converting "page.htm" (<300kb)- "processed", "structure" and "parsed" folders had the same size as "input" folder (all were ~50mb). So if 50mb is processed why I get a 12mb book without content? Is there a problem with output plugin?

P.S. in all cases I use "linearize tables" option in the last - v.0.8.13 Calibre.
Is there something better for conversions?

Toxaris
08-11-2011, 02:04 AM
There are several tools that can extract the HTML from CHM. There are also tools that can read the CHM (for editing) and than save it as HTML.
Don't run the HTML through Calibre, since it will not help you. Calibre follows this procedure and apparently there is some code in the HTML/CHM which is causing your issue.
Then you either clean it up manually in a text-editor or you can try loading it into Sigil to see what happens there.