View Single Post
Old 01-25-2012, 02:23 PM   #1
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
HTML to ePub stripping out Content text

Here is a puzzler. I am running ebook-convert on a HTML toc doc with the following settings:

sudo ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=100 --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=100 --preserve-cover-aspect-ratio

the document 1.html referenced by tmp/temptoc.html

http://en.wikipedia.org/w/index.php?...&title=Magento

has a "Contents" section whose html source looks like this:

Quote:
<table id="toc" class="toc">
<tr>
<td>
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#History"><span class="tocnumber">1</span> <span class="toctext">History</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#See_also"><span class="tocnumber">2</span> <span class="toctext">See also</span></a></li>
<li class="toclevel-1 tocsection-3"><a href="#References"><span class="tocnumber">3</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-4"><a href="#External_links"><span class="tocnumber">4</span> <span class="toctext">External links</span></a></li>
</ul>
</td>
</tr>
</table>
When Calibre processes this document, it is removing the text from the bullets, so that all that's showing up is four bullets, which looks stupid. I used Sigil to inspect the HTML inside the ePub, and it looks as if Calibre is applying new styles to what it detects as TOC bullets.

Quote:
<body class="calibre">
<table class="toc" id="toc">
<tr class="calibre11">
<td class="calibre15">
<div class="calibre8" id="toctitle">
<h2 class="calibre16" id="calibre_pb_1">Contents</h2>
</div>

<ul class="calibre9">
<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#History"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#See_also"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#References"></a></li>

<li class="toclevel"><a class="calibre5" href="../Text/1_split_000.html#External_links"></a></li>
</ul>
</td>
</tr>
</table>
</body>
</html>
Apparently this has something to do with toc detection, but I've been pulling my hair out and haven't gotten anywhere. Can some kind soul speed things along for me?
Attached Files
File Type: epub 614162738.epub (148.6 KB, 233 views)
File Type: pdf wikisourcehtml.pdf (127.9 KB, 725 views)
nimblebooks is offline   Reply With Quote