01-25-2012, 02:23 PM | #1 | ||
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
HTML to ePub stripping out Content text
Here is a puzzler. I am running ebook-convert on a HTML toc doc with the following settings:
sudo ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=100 --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=100 --preserve-cover-aspect-ratio the document 1.html referenced by tmp/temptoc.html http://en.wikipedia.org/w/index.php?...&title=Magento has a "Contents" section whose html source looks like this: Quote:
Quote:
|
||
01-27-2012, 02:56 PM | #2 |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
Bump! Maybe I provided too much information. The basic problem is this: I am providing an HTML TOC (doc 0) for a collection of n HTML documents. Calibre is correctly crawling and building that TOC document, but, when it encounters a section llabeled "Contents" in doc #1, it is applying its own classes *AND* stripping out the text from between the anchor tags. The result is that doc 1 has a bunch of blank bullets where it should have an unchanged "Contents" section (the contents are of doc 1, not the whole collection). Please help! This is on my critical path.
|
01-28-2012, 12:28 AM | #3 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You provided too little information A minimal test case is what is needed. i.e. a small set of thml files that show this behavior.
|
01-30-2012, 07:07 PM | #4 |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
Here are the requested htmls, and the ebook-convert used. there are some other problems, but the one that I'm trying to figure out right now is that the "Contents" in document 1.html are blanked out in the resulting ePub.
sudo ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=100 --level1-toc="//h:h1" --level2-toc="//h:h2" --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=100 --preserve-cover-aspect-ratio |
01-31-2012, 07:38 PM | #5 |
Enthusiast
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
bump!
|
01-31-2012, 08:23 PM | #6 |
Well trained by Cats
Posts: 29,811
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Moderator Notice
Please don't Bump (it is rude to folke that get Mail Notification). 227 People that did not have an answer viewed this thread. |
02-01-2012, 01:50 AM | #7 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The HTML in those files is a complete mess. Parsing it is failing, it has nothing to do with table of contents. For a start remove the invalid title and head tags at the beginning of the document.
Last edited by kovidgoyal; 02-01-2012 at 01:54 AM. |
Tags |
epub, html, toc creation, toc detection, toc problem |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Overlapping text when converting html to mobi/epub | TopCat | Conversion | 4 | 11-28-2011 06:13 AM |
HTML to EPUB Inline Text/Image Issue | HoushaSen | Conversion | 2 | 07-02-2011 08:03 PM |
Calibre Recipe HTML content differs from raw html of index.html. | krunk | Calibre | 4 | 09-20-2010 09:48 PM |