04-07-2010, 12:04 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: iPod/Stanza
|
Help with HTML to ePub conversion...?
Hi all,
I'm working on learning Calibre to get some documents onto my iPod touch. So far I'm quite impressed with Calibre and what it brings to organizing and managing eBooks - but I'm having a little trouble I'm not sure how to handle. I did download the demo html zip file and ran it thru the conversion and everything looked great. I took a file that I have that is a pdf - if I have calibre convert it to epub, it converts fine - but as everyone knows - looses the TOC. So in my desire to maintain a TOC (since the books I have that I want to keep with me are reference books, it's kind of important to me ) I obtained, installed, and used pdftohtml, having it create a "complex" html document ie: pdftohtml -c mypdfile.pdf Which generates a html file for each page, all of the images are in their separate files (for this particular pdf, they are .png) and there is a index page. I loaded the index page into calibre and asked it to convert - after a brief period the epub book was ready - it converted, TOC worked, etc. - but I found that on the pages, the images were no longer in their proper location. I looked at the source code of a given page, and I have a feeling its due to how the pdftohtml generated it - see here a sample of the code: Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <TITLE>Page 8</TITLE> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <DIV style="position:relative;width:918;height:1188;"> <STYLE type="text/css"> <!-- .ft0{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.00000px;} .ft1{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87000px;} .ft2{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84340px;} .ft3{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81373px;} .ft4{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75788px;} .ft5{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74647px;} .ft6{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.73427px;} .ft7{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72560px;} .ft8{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72522px;} .ft9{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72251px;} .ft10{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72595px;} .ft11{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.72563px;} .ft12{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71391px;} .ft13{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71963px;} .ft14{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71148px;} .ft15{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.71440px;} .ft16{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.17570px;} .ft17{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.89888px;} .ft18{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83213px;} .ft19{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81253px;} .ft20{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80741px;} .ft21{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77645px;} .ft22{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75440px;} .ft23{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74228px;} .ft24{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74500px;} .ft25{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74331px;} .ft26{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74065px;} .ft27{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74611px;} .ft28{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.74039px;} .ft29{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87390px;} .ft30{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80571px;} .ft31{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81655px;} .ft32{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.81199px;} .ft33{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79734px;} .ft34{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80582px;} .ft35{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79753px;} .ft36{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78495px;} .ft37{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77855px;} .ft38{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78234px;} .ft39{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76726px;} .ft40{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04580px;} .ft41{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.92880px;} .ft42{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84356px;} .ft43{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.82962px;} .ft44{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83790px;} .ft45{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80870px;} .ft46{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80137px;} .ft47{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80265px;} .ft48{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79859px;} .ft49{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79262px;} .ft50{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80344px;} .ft51{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80419px;} .ft52{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.02340px;} .ft53{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88146px;} .ft54{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85470px;} .ft55{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.86156px;} .ft56{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85215px;} .ft57{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.03800px;} .ft58{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85680px;} .ft59{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76957px;} .ft60{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78810px;} .ft61{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77705px;} .ft62{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.80504px;} .ft63{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76955px;} .ft64{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77257px;} .ft65{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75435px;} .ft66{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.75527px;} .ft67{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.77550px;} .ft68{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76605px;} .ft69{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76186px;} .ft70{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.76474px;} .ft71{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.16572px;} .ft72{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:1.04940px;} .ft73{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.93998px;} .ft74{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.88517px;} .ft75{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.87275px;} .ft76{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85225px;} .ft77{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.84215px;} .ft78{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.85129px;} .ft79{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.83136px;} .ft80{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79986px;} .ft81{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.79684px;} .ft82{virtical-align:top;font-size:16px;font-family:Times;color:#231f20;letter-spacing:0.78436px;} --> </STYLE> </HEAD> <BODY bgcolor="#A0A0A0" vlink="blue" link="blue"> <IMG width="918" height="1188" src="Leveraging_AD_on_MOSXS_2.2008.png" alt="background image"> <DIV style="position:absolute;top:1079;left:108"><nobr><span class="ft0">8</span></nobr></DIV> <DIV style="position:absolute;top:107;left:181"><nobr><span class="ft15">3. Click on your server's disclosure triangle in the table on the left and click on</span></nobr></DIV> <DIV style="position:absolute;top:128;left:194"><nobr><span class="ft28">the Open Directory service to view its status. Click on the Settings button in</span></nobr></DIV> <DIV style="position:absolute;top:150;left:194"><nobr><span class="ft29">the toolbar.</span></nobr></DIV> <DIV style="position:absolute;top:489;left:181"><nobr><span class="ft39">4. Click on the "Change" button to open the Service Configuration Assistant.</span></nobr></DIV> <DIV style="position:absolute;top:741;left:181"><nobr><span class="ft51">5. Choose to create an "Open Directory Master" and provide a username and</span></nobr></DIV> <DIV style="position:absolute;top:762;left:194"><nobr><span class="ft56">password to administer the new domain.</span></nobr></DIV> <DIV style="position:absolute;top:1036;left:181"><nobr><span class="ft70">6. In the third pane (Master Domain Info), you will be prompted for a Kerberos</span></nobr></DIV> <DIV style="position:absolute;top:1057;left:194"><nobr><span class="ft82">Realm name and search base. The Kerberos realm name is arbitrary, but is</span></nobr></DIV> </DIV> </BODY> </HTML> if I convert without the -c, I don't get the pics. Any suggestions on how to better convert my pdfs to html prior to importing so that I can somewhat keep formatting, and more importantly keep my TOC, I'd be more then willing to try. I should note, I'm using OSX. |
04-07-2010, 12:21 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
complex html documents are not reflowable. Converting PDF to a complex HTML is largely meaningless, since in both cases absolute positioning is used to position elements on a page.
|
Advert | |
|
04-07-2010, 12:29 AM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: iPod/Stanza
|
Iew, that's what I was afraid of. Are there any other tools you might be able to suggest that I could use as an in-between so that I can keep my TOC's once I take them thru Calibre?
|
04-07-2010, 12:34 AM | #4 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not at the moment, calibre's (under development) new pdf engine preserves TOCs automatically, but I wont be able to find the time to complete it for a while.
|
05-10-2010, 02:26 PM | #5 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: Sony ereader Prs-300
|
Have you tried Sigil?
I find that it converts HTML files pretty well to epub format... |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Conversion memory error (HTML->EPUB) | doremifaso | Calibre | 4 | 06-25-2010 10:56 PM |
Quick and dirty conversion of html to epub WITH intra-file links | Birdonawire | ePub | 2 | 06-18-2010 02:18 AM |
Multiple html to epub conversion. | Barthelemy | ePub | 4 | 03-30-2010 06:18 AM |
HTML Book + non HTML TOC to epub | aarcane | Calibre | 4 | 03-02-2010 02:58 AM |
'utf8' codec can't decode bytes error (HTML to EPUB conversion) | gsz | Calibre | 10 | 10-26-2009 06:29 PM |