Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-07-2010, 12:04 AM   #1
Nethfel
Junior Member
Nethfel began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: iPod/Stanza
Help with HTML to ePub conversion...?

Hi all,

I'm working on learning Calibre to get some documents onto my iPod touch. So far I'm quite impressed with Calibre and what it brings to organizing and managing eBooks - but I'm having a little trouble I'm not sure how to handle.

I did download the demo html zip file and ran it thru the conversion and everything looked great.

I took a file that I have that is a pdf - if I have calibre convert it to epub, it converts fine - but as everyone knows - looses the TOC. So in my desire to maintain a TOC (since the books I have that I want to keep with me are reference books, it's kind of important to me ) I obtained, installed, and used pdftohtml, having it create a "complex" html document

ie: pdftohtml -c mypdfile.pdf

Which generates a html file for each page, all of the images are in their separate files (for this particular pdf, they are .png) and there is a index page.

I loaded the index page into calibre and asked it to convert - after a brief period the epub book was ready - it converted, TOC worked, etc. - but I found that on the pages, the images were no longer in their proper location.

I looked at the source code of a given page, and I have a feeling its due to how the pdftohtml generated it - see here a sample of the code:

Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Page 8</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<DIV style="position:relative;width:918;height:1188;">
<STYLE type="text/css">
<!--
	.ft0{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.00000px;}
	.ft1{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87000px;}
	.ft2{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84340px;}
	.ft3{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81373px;}
	.ft4{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75788px;}
	.ft5{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74647px;}
	.ft6{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.73427px;}
	.ft7{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72560px;}
	.ft8{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72522px;}
	.ft9{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72251px;}
	.ft10{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72595px;}
	.ft11{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72563px;}
	.ft12{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71391px;}
	.ft13{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71963px;}
	.ft14{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71148px;}
	.ft15{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71440px;}
	.ft16{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.17570px;}
	.ft17{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.89888px;}
	.ft18{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83213px;}
	.ft19{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81253px;}
	.ft20{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80741px;}
	.ft21{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77645px;}
	.ft22{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75440px;}
	.ft23{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74228px;}
	.ft24{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74500px;}
	.ft25{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74331px;}
	.ft26{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74065px;}
	.ft27{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74611px;}
	.ft28{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74039px;}
	.ft29{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87390px;}
	.ft30{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80571px;}
	.ft31{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81655px;}
	.ft32{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81199px;}
	.ft33{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79734px;}
	.ft34{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80582px;}
	.ft35{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79753px;}
	.ft36{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78495px;}
	.ft37{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77855px;}
	.ft38{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78234px;}
	.ft39{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76726px;}
	.ft40{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04580px;}
	.ft41{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.92880px;}
	.ft42{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84356px;}
	.ft43{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.82962px;}
	.ft44{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83790px;}
	.ft45{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80870px;}
	.ft46{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80137px;}
	.ft47{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80265px;}
	.ft48{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79859px;}
	.ft49{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79262px;}
	.ft50{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80344px;}
	.ft51{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80419px;}
	.ft52{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.02340px;}
	.ft53{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88146px;}
	.ft54{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85470px;}
	.ft55{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.86156px;}
	.ft56{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85215px;}
	.ft57{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.03800px;}
	.ft58{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85680px;}
	.ft59{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76957px;}
	.ft60{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78810px;}
	.ft61{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77705px;}
	.ft62{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80504px;}
	.ft63{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76955px;}
	.ft64{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77257px;}
	.ft65{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75435px;}
	.ft66{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75527px;}
	.ft67{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77550px;}
	.ft68{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76605px;}
	.ft69{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76186px;}
	.ft70{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76474px;}
	.ft71{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.16572px;}
	.ft72{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04940px;}
	.ft73{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.93998px;}
	.ft74{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88517px;}
	.ft75{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87275px;}
	.ft76{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85225px;}
	.ft77{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84215px;}
	.ft78{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85129px;}
	.ft79{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83136px;}
	.ft80{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79986px;}
	.ft81{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79684px;}
	.ft82{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78436px;}
-->
</STYLE>
</HEAD>
<BODY bgcolor="#A0A0A0" vlink="blue" link="blue">
<IMG width="918" height="1188" src="Leveraging_AD_on_MOSXS_2.2008.png" alt="background image">
<DIV style="position:absolute;top:1079;left:108"><nobr><span class="ft0">8</span></nobr></DIV>
<DIV style="position:absolute;top:107;left:181"><nobr><span class="ft15">3. Click on your server's disclosure triangle in the table on the left and click on</span></nobr></DIV>
<DIV style="position:absolute;top:128;left:194"><nobr><span class="ft28">the Open Directory service to view its status. Click on the Settings button in</span></nobr></DIV>
<DIV style="position:absolute;top:150;left:194"><nobr><span class="ft29">the toolbar.</span></nobr></DIV>
<DIV style="position:absolute;top:489;left:181"><nobr><span class="ft39">4. Click on the "Change" button to open the Service Configuration Assistant.</span></nobr></DIV>
<DIV style="position:absolute;top:741;left:181"><nobr><span class="ft51">5. Choose to create an "Open Directory Master" and provide a username and</span></nobr></DIV>
<DIV style="position:absolute;top:762;left:194"><nobr><span class="ft56">password to administer the new domain.</span></nobr></DIV>
<DIV style="position:absolute;top:1036;left:181"><nobr><span class="ft70">6. In the third pane (Master Domain Info), you will be prompted for a Kerberos</span></nobr></DIV>
<DIV style="position:absolute;top:1057;left:194"><nobr><span class="ft82">Realm name and search base. The Kerberos realm name is arbitrary, but is</span></nobr></DIV>
</DIV>
</BODY>
</HTML>
as one can see, the divs are set with specific positions, and I'm guessing this is what is causing the problem.

if I convert without the -c, I don't get the pics.

Any suggestions on how to better convert my pdfs to html prior to importing so that I can somewhat keep formatting, and more importantly keep my TOC, I'd be more then willing to try.

I should note, I'm using OSX.
Nethfel is offline   Reply With Quote
Old 04-07-2010, 12:21 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
complex html documents are not reflowable. Converting PDF to a complex HTML is largely meaningless, since in both cases absolute positioning is used to position elements on a page.
kovidgoyal is offline   Reply With Quote
Advert
Old 04-07-2010, 12:29 AM   #3
Nethfel
Junior Member
Nethfel began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: iPod/Stanza
Iew, that's what I was afraid of. Are there any other tools you might be able to suggest that I could use as an in-between so that I can keep my TOC's once I take them thru Calibre?
Nethfel is offline   Reply With Quote
Old 04-07-2010, 12:34 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Not at the moment, calibre's (under development) new pdf engine preserves TOCs automatically, but I wont be able to find the time to complete it for a while.
kovidgoyal is offline   Reply With Quote
Old 05-10-2010, 02:26 PM   #5
pablo2340
Junior Member
pablo2340 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: Sony ereader Prs-300
Have you tried Sigil?
I find that it converts HTML files pretty well to epub format...
pablo2340 is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Conversion memory error (HTML->EPUB) doremifaso Calibre 4 06-25-2010 10:56 PM
Quick and dirty conversion of html to epub WITH intra-file links Birdonawire ePub 2 06-18-2010 02:18 AM
Multiple html to epub conversion. Barthelemy ePub 4 03-30-2010 06:18 AM
HTML Book + non HTML TOC to epub aarcane Calibre 4 03-02-2010 02:58 AM
'utf8' codec can't decode bytes error (HTML to EPUB conversion) gsz Calibre 10 10-26-2009 06:29 PM


All times are GMT -4. The time now is 06:01 PM.


MobileRead.com is a privately owned, operated and funded community.