View Single Post
Old 02-20-2012, 01:59 PM   #1
nimblebooks
Enthusiast
nimblebooks began at the beginning.
 
Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
HTML input plugin stripping text within toc tags in child html file

Hi,

Same problem as a while ago but have done some more testing. Files attached.

ebook-convert tmp/temptoc.html $mediatargetpath$sku".epub" --max-levels=1 --toc-threshold=6 --cover=$imagedir$sku$cover_image_extension --book-producer="Nimble Combinatorial Publishing" --publisher="Nimble Combinatorial Publishing" --max-toc-links=20 --preserve-cover-aspect-ratio -vv --debug-pipeline="debug" --duplicate-links-in-toc --chapter="/"

From debug, I can tell thathe conversion is getting messed up in the input plugin stage: the following HTML in the source file safe1.html generated from the API call to http://en.wikipedia.org/w/index.php?...eship_Bismarck

Code:
<table id="toc" class="toc">
<tr>
<td>
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Construction_and_characteristics"><span class="tocnumber">1</span> <span class="toctext">Construction and characteristics</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Service_history"><span class="tocnumber">2</span> <span class="toctext">Service history</span></a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Operation_Rhein.C3.BCbung"><span class="tocnumber">2.1</span> <span class="toctext">Operation Rheinübung</span></a>
<ul>
<li class="toclevel-3 tocsection-4"><a href="#Battle_of_the_Denmark_Strait"><span class="tocnumber">2.1.1</span> <span class="toctext">Battle of the Denmark Strait</span></a></li>
<li class="toclevel-3 tocsection-5"><a href="#The_chase"><span class="tocnumber">2.1.2</span> <span class="toctext">The chase</span></a></li>
<li class="toclevel-3 tocsection-6"><a href="#Sinking"><span class="tocnumber">2.1.3</span> <span class="toctext">Sinking</span></a></li>
</ul>
</li>
</ul>
</li>
<li class="toclevel-1 tocsection-7"><a href="#Media_portrayals_of_sinking"><span class="tocnumber">3</span> <span class="toctext">Media portrayals of sinking</span></a></li>
<li class="toclevel-1 tocsection-8"><a href="#Discovery_of_the_wreck"><span class="tocnumber">4</span> <span class="toctext">Discovery of the wreck</span></a>
<ul>
<li class="toclevel-2 tocsection-9"><a href="#Discovery_by_Robert_Ballard"><span class="tocnumber">4.1</span> <span class="toctext">Discovery by Robert Ballard</span></a></li>
<li class="toclevel-2 tocsection-10"><a href="#Subsequent_expeditions"><span class="tocnumber">4.2</span> <span class="toctext">Subsequent expeditions</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-11"><a href="#References_in_the_Wehrmachtbericht"><span class="tocnumber">5</span> <span class="toctext">References in the Wehrmachtbericht</span></a></li>
<li class="toclevel-1 tocsection-12"><a href="#Footnotes"><span class="tocnumber">6</span> <span class="toctext">Footnotes</span></a></li>
<li class="toclevel-1 tocsection-13"><a href="#References"><span class="tocnumber">7</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-14"><a href="#Further_Reading"><span class="tocnumber">8</span> <span class="toctext">Further Reading</span></a></li>
</ul>
</td>
</tr>
</table>
becomes in debug/input/1safe.html:

Code:
<table class="toc" id="toc">
<tbody><tr>
<td>
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Construction_and_characteristics"> </a></li>
<li class="toclevel-1 tocsection-2"><a href="#Service_history"> </a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Operation_Rhein.C3.BCbung"> </a>
<ul>
<li class="toclevel-3 tocsection-4"><a href="#Battle_of_the_Denmark_Strait"> </a></li>
<li class="toclevel-3 tocsection-5"><a href="#The_chase"> </a></li>
<li class="toclevel-3 tocsection-6"><a href="#Sinking"> </a></li>
</ul>
</li>
</ul>
</li>
<li class="toclevel-1 tocsection-7"><a href="#Media_portrayals_of_sinking"> </a></li>
<li class="toclevel-1 tocsection-8"><a href="#Discovery_of_the_wreck"> </a>
<ul>
<li class="toclevel-2 tocsection-9"><a href="#Discovery_by_Robert_Ballard"> </a></li>
<li class="toclevel-2 tocsection-10"><a href="#Subsequent_expeditions"> </a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-11"><a href="#References_in_the_Wehrmachtbericht"> </a></li>
<li class="toclevel-1 tocsection-12"><a href="#Footnotes"> </a></li>
<li class="toclevel-1 tocsection-13"><a href="#References"> </a></li>
<li class="toclevel-1 tocsection-14"><a href="#Further_Reading"> </a></li>
</ul>
</td>
</tr>
</tbody></table>
after it is passed through the input plugin.

I simplified the TOC html as much as possible, wrapped the simplest possible html around the API html found in source file/safe1.html.

What's happening here?

Any help disentangling this "messy" HTML would be much appreciated!

Fred
Attached Files
File Type: zip input.zip (33.1 KB, 276 views)
File Type: zip source_files.zip (71.6 KB, 286 views)
nimblebooks is offline   Reply With Quote