Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2012, 02:23 PM   #1
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
HTML to ePub stripping out Content text

Here is a puzzler. I am running ebook-convert on a HTML toc doc with the following settings:

sudo ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=100 --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=100 --preserve-cover-aspect-ratio

the document 1.html referenced by tmp/temptoc.html

http://en.wikipedia.org/w/index.php?...&title=Magento

has a "Contents" section whose html source looks like this:

Quote:
<table id="toc" class="toc">
<tr>
<td>
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#History"><span class="tocnumber">1</span> <span class="toctext">History</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#See_also"><span class="tocnumber">2</span> <span class="toctext">See also</span></a></li>
<li class="toclevel-1 tocsection-3"><a href="#References"><span class="tocnumber">3</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-4"><a href="#External_links"><span class="tocnumber">4</span> <span class="toctext">External links</span></a></li>
</ul>
</td>
</tr>
</table>
When Calibre processes this document, it is removing the text from the bullets, so that all that's showing up is four bullets, which looks stupid. I used Sigil to inspect the HTML inside the ePub, and it looks as if Calibre is applying new styles to what it detects as TOC bullets.

Quote:
<body class="calibre">
<table class="toc" id="toc">
<tr class="calibre11">
<td class="calibre15">
<div class="calibre8" id="toctitle">
<h2 class="calibre16" id="calibre_pb_1">Contents</h2>
</div>

<ul class="calibre9">
<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#History"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#See_also"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#References"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#External_links"></a></li>
</ul>
</td>
</tr>
</table>
</body>
</html>
Apparently this has something to do with toc detection, but I've been pulling my hair out and haven't gotten anywhere. Can some kind soul speed things along for me?
Attached Files
File Type: epub 614162738.epub (148.6 KB, 233 views)
File Type: pdf wikisourcehtml.pdf (127.9 KB, 725 views)
nimblebooks is offline   Reply With Quote
Old 01-27-2012, 02:56 PM   #2
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
Bump! Maybe I provided too much information. The basic problem is this: I am providing an HTML TOC (doc 0) for a collection of n HTML documents. Calibre is correctly crawling and building that TOC document, but, when it encounters a section llabeled "Contents" in doc #1, it is applying its own classes *AND* stripping out the text from between the anchor tags. The result is that doc 1 has a bunch of blank bullets where it should have an unchanged "Contents" section (the contents are of doc 1, not the whole collection). Please help! This is on my critical path.
nimblebooks is offline   Reply With Quote
Advert
Old 01-28-2012, 12:28 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You provided too little information A minimal test case is what is needed. i.e. a small set of thml files that show this behavior.
kovidgoyal is online now   Reply With Quote
Old 01-30-2012, 07:07 PM   #4
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
Here are the requested htmls, and the ebook-convert used. there are some other problems, but the one that I'm trying to figure out right now is that the "Contents" in document 1.html are blanked out in the resulting ePub.

sudo ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=100 --level1-toc="//h:h1" --level2-toc="//h:h2" --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=100 --preserve-cover-aspect-ratio
Attached Files
File Type: zip htmls.zip (76.8 KB, 198 views)
nimblebooks is offline   Reply With Quote
Old 01-31-2012, 07:38 PM   #5
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
bump!
nimblebooks is offline   Reply With Quote
Advert
Old 01-31-2012, 08:23 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Moderator Notice
Please don't Bump (it is rude to folke that get Mail Notification).
227 People that did not have an answer viewed this thread.
theducks is offline   Reply With Quote
Old 02-01-2012, 01:50 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The HTML in those files is a complete mess. Parsing it is failing, it has nothing to do with table of contents. For a start remove the invalid title and head tags at the beginning of the document.

Last edited by kovidgoyal; 02-01-2012 at 01:54 AM.
kovidgoyal is online now   Reply With Quote
Reply

Tags
epub, html, toc creation, toc detection, toc problem


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Overlapping text when converting html to mobi/epub TopCat Conversion 4 11-28-2011 06:13 AM
HTML to EPUB Inline Text/Image Issue HoushaSen Conversion 2 07-02-2011 08:03 PM
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM


All times are GMT -4. The time now is 07:42 AM.


MobileRead.com is a privately owned, operated and funded community.