Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2012, 05:32 PM   #1
bwana
Member
bwana began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Mar 2010
Device: none
adding an html doc to calibre

I have a document as html but when i add it to calibre, it appears as a zip file. I try to 'open' it within calibre and all i get is a finder window showing the zip file. I thought maybe calibre didnt know what the character encoding was so I went to Advanced->Plugins->HTML to zip plugin and added utf-8

I tried to convert to epub and and all the links to the pictures and diagrams are missing. I know the html is correct because the pictures show up when i open the raw html file in safari.

Can someone help me with html import? or point me to some examples where these issues are solved? Thanks.

sigil gives me these errors:
line 3 The <language> element is missing.
line 3 The <title> element is missing.
line 236 attribute 'page' is not declared for element 'image'
line 236 no declaration found for element 'image'
line 295 element 'image' is not allowed for content model
'(p|h1|h2|h3|h4|h5|h6|div|ul|ol|dl|pre|hr|blockquo te|address|fieldset|table|switch|form|noscript|ins |del|script)'

line 236 is blank and line 295 is just some text-no image references or any html code. i guess sigil counts lines differently than text editors.

Last edited by bwana; 01-12-2012 at 05:52 PM.
bwana is offline   Reply With Quote
Old 01-12-2012, 06:29 PM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
It is normal for Calibre to store a HTML document as a ZIP file. It should pull all images into that ZIP as well a long as all the image files are local - is that the case here?
itimpi is offline   Reply With Quote
Advert
Old 01-12-2012, 06:41 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,914
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by bwana View Post
I have a document as html but when i add it to calibre, it appears as a zip file. I try to 'open' it within calibre and all i get is a finder window showing the zip file. I thought maybe calibre didnt know what the character encoding was so I went to Advanced->Plugins->HTML to zip plugin and added utf-8

I tried to convert to epub and and all the links to the pictures and diagrams are missing. I know the html is correct because the pictures show up when i open the raw html file in safari.

Can someone help me with html import? or point me to some examples where these issues are solved? Thanks.

sigil gives me these errors:
line 3 The <language> element is missing.
line 3 The <title> element is missing.
line 236 attribute 'page' is not declared for element 'image'
line 236 no declaration found for element 'image'
line 295 element 'image' is not allowed for content model
'(p|h1|h2|h3|h4|h5|h6|div|ul|ol|dl|pre|hr|blockquo te|address|fieldset|table|switch|form|noscript|ins |del|script)'

line 236 is blank and line 295 is just some text-no image references or any html code. i guess sigil counts lines differently than text editors.
line 3 Tap F8 (Language will default)
Fill in the title then press Enter in Sigil
Image? is that supposed to be img?
Backup from 295, something is wrong. 295 is just when it became obvious
theducks is online now   Reply With Quote
Old 01-12-2012, 08:23 PM   #4
bwana
Member
bwana began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Mar 2010
Device: none
Quote:
Originally Posted by itimpi View Post
It is normal for Calibre to store a HTML document as a ZIP file. It should pull all images into that ZIP as well a long as all the image files are local - is that the case here?
yes, i opened the zip file and found
content.opf
META-INF folder containing container.xml file
metainf file
mimetype file
the original html file
all the png files referenced in the html file



I set epub output in the preferences to 'flatten epub'
I went through the document and found some umlauts that needed conversion to the html code. The resulting epub validates with Sigil fine.

Now
I can view the epub document in calibre and it displays correctly with the images. But Stanza does not show the images. The firefox add-on displays the images properly. Book Reader1.23 renders the images fine. Adobe digital editions displays the epub and the images however the images stay overlaid on the text even when the pages are flipped. Adobe software is crippled.
Maybe epubs are not meant to be read on a mac.

I wish I knew why stanza was choking on the images and skipping them. The html, for example, is:
<br><IMG SRC="p544a.png"><br>

Also, sprinkled throughout the text are chapters with the word Chapter followed by the number. Calibre preferences->common options->Structure detection reveals this for the 'detect chapters at xpath':
//*[((name()='h1' or name()='h2') and re:test(., '\s*((chapter|book|section|part)\s+)|((prolog|prol ogue|epilogue)(\s+|$))', 'i')) or @class = 'chapter']
No, I have no set off the word chapter with h1 or h2 or any html code. it is buried in the text.
It would seem that the word 'chapter' should work, but it does not. And no table of contents either .

Interestingly, adding a txt version of the document to calibre results in properly rendered
Chapters when converted to epub. The word chapter follows a page break and is bold and in a larger font.

Last edited by bwana; 01-12-2012 at 09:53 PM.
bwana is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre hangs on adding large HTML file kongzifan Conversion 2 12-10-2011 05:05 AM
What am I doing wrong (adding html book to calibre) TuxGirl Calibre 6 10-14-2011 11:42 AM
Calibre with HTML and RTF and DOC niceboy Calibre 2 11-05-2010 12:35 AM
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM
html or doc better? spear Fictionwise eBookwise 11 12-16-2007 09:43 PM


All times are GMT -4. The time now is 06:05 PM.


MobileRead.com is a privately owned, operated and funded community.