View Single Post
Old 04-07-2010, 12:04 AM   #1
Nethfel
Junior Member
Nethfel began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: iPod/Stanza
Help with HTML to ePub conversion...?

Hi all,

I'm working on learning Calibre to get some documents onto my iPod touch. So far I'm quite impressed with Calibre and what it brings to organizing and managing eBooks - but I'm having a little trouble I'm not sure how to handle.

I did download the demo html zip file and ran it thru the conversion and everything looked great.

I took a file that I have that is a pdf - if I have calibre convert it to epub, it converts fine - but as everyone knows - looses the TOC. So in my desire to maintain a TOC (since the books I have that I want to keep with me are reference books, it's kind of important to me ) I obtained, installed, and used pdftohtml, having it create a "complex" html document

ie: pdftohtml -c mypdfile.pdf

Which generates a html file for each page, all of the images are in their separate files (for this particular pdf, they are .png) and there is a index page.

I loaded the index page into calibre and asked it to convert - after a brief period the epub book was ready - it converted, TOC worked, etc. - but I found that on the pages, the images were no longer in their proper location.

I looked at the source code of a given page, and I have a feeling its due to how the pdftohtml generated it - see here a sample of the code:

Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Page 8</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<DIV style="position:relative;width:918;height:1188;">
<STYLE type="text/css">
<!--
	.ft0{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.00000px;}
	.ft1{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87000px;}
	.ft2{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84340px;}
	.ft3{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81373px;}
	.ft4{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75788px;}
	.ft5{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74647px;}
	.ft6{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.73427px;}
	.ft7{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72560px;}
	.ft8{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72522px;}
	.ft9{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72251px;}
	.ft10{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72595px;}
	.ft11{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72563px;}
	.ft12{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71391px;}
	.ft13{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71963px;}
	.ft14{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71148px;}
	.ft15{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71440px;}
	.ft16{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.17570px;}
	.ft17{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.89888px;}
	.ft18{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83213px;}
	.ft19{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81253px;}
	.ft20{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80741px;}
	.ft21{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77645px;}
	.ft22{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75440px;}
	.ft23{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74228px;}
	.ft24{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74500px;}
	.ft25{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74331px;}
	.ft26{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74065px;}
	.ft27{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74611px;}
	.ft28{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74039px;}
	.ft29{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87390px;}
	.ft30{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80571px;}
	.ft31{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81655px;}
	.ft32{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81199px;}
	.ft33{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79734px;}
	.ft34{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80582px;}
	.ft35{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79753px;}
	.ft36{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78495px;}
	.ft37{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77855px;}
	.ft38{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78234px;}
	.ft39{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76726px;}
	.ft40{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04580px;}
	.ft41{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.92880px;}
	.ft42{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84356px;}
	.ft43{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.82962px;}
	.ft44{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83790px;}
	.ft45{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80870px;}
	.ft46{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80137px;}
	.ft47{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80265px;}
	.ft48{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79859px;}
	.ft49{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79262px;}
	.ft50{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80344px;}
	.ft51{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80419px;}
	.ft52{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.02340px;}
	.ft53{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88146px;}
	.ft54{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85470px;}
	.ft55{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.86156px;}
	.ft56{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85215px;}
	.ft57{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.03800px;}
	.ft58{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85680px;}
	.ft59{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76957px;}
	.ft60{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78810px;}
	.ft61{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77705px;}
	.ft62{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80504px;}
	.ft63{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76955px;}
	.ft64{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77257px;}
	.ft65{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75435px;}
	.ft66{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75527px;}
	.ft67{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77550px;}
	.ft68{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76605px;}
	.ft69{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76186px;}
	.ft70{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76474px;}
	.ft71{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.16572px;}
	.ft72{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04940px;}
	.ft73{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.93998px;}
	.ft74{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88517px;}
	.ft75{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87275px;}
	.ft76{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85225px;}
	.ft77{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84215px;}
	.ft78{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85129px;}
	.ft79{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83136px;}
	.ft80{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79986px;}
	.ft81{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79684px;}
	.ft82{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78436px;}
-->
</STYLE>
</HEAD>
<BODY bgcolor="#A0A0A0" vlink="blue" link="blue">
<IMG width="918" height="1188" src="Leveraging_AD_on_MOSXS_2.2008.png" alt="background image">
<DIV style="position:absolute;top:1079;left:108"><nobr><span class="ft0">8</span></nobr></DIV>
<DIV style="position:absolute;top:107;left:181"><nobr><span class="ft15">3. Click on your server's disclosure triangle in the table on the left and click on</span></nobr></DIV>
<DIV style="position:absolute;top:128;left:194"><nobr><span class="ft28">the Open Directory service to view its status. Click on the Settings button in</span></nobr></DIV>
<DIV style="position:absolute;top:150;left:194"><nobr><span class="ft29">the toolbar.</span></nobr></DIV>
<DIV style="position:absolute;top:489;left:181"><nobr><span class="ft39">4. Click on the "Change" button to open the Service Configuration Assistant.</span></nobr></DIV>
<DIV style="position:absolute;top:741;left:181"><nobr><span class="ft51">5. Choose to create an "Open Directory Master" and provide a username and</span></nobr></DIV>
<DIV style="position:absolute;top:762;left:194"><nobr><span class="ft56">password to administer the new domain.</span></nobr></DIV>
<DIV style="position:absolute;top:1036;left:181"><nobr><span class="ft70">6. In the third pane (Master Domain Info), you will be prompted for a Kerberos</span></nobr></DIV>
<DIV style="position:absolute;top:1057;left:194"><nobr><span class="ft82">Realm name and search base. The Kerberos realm name is arbitrary, but is</span></nobr></DIV>
</DIV>
</BODY>
</HTML>
as one can see, the divs are set with specific positions, and I'm guessing this is what is causing the problem.

if I convert without the -c, I don't get the pics.

Any suggestions on how to better convert my pdfs to html prior to importing so that I can somewhat keep formatting, and more importantly keep my TOC, I'd be more then willing to try.

I should note, I'm using OSX.
Nethfel is offline   Reply With Quote