Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-10-2011, 12:26 PM   #1
haven
Member
haven began at the beginning.
 
Posts: 10
Karma: 34
Join Date: Nov 2008
Device: Kindle 3, iPad, Iliad Irex
HTML to epub displays munged source code

I'm having trouble getting Calibre to do a basic single-file HTML to epub conversion. I think this may be a CSS-related bug in Calibre 0.7.48 (Win7-64), but since everything I know about HTML and CSS was learned in the last few hours trying to sort this out myself, the problem could well be the nut behind the wheel -- I don't know enough to file an actual bug report :-)

Calibre seems to get confused when combining in-line HTML boldface formatting with the "white-space" CSS construct:
code { white-space: nowrap; }

That is, when style.css includes the above, Calibre's formatting is thrown off by HTML like
<CODE><B>Pragma</B></CODE>

The epub book that Calibre generates will display the code somewhere on the page other than inline. I'm attaching screenshot .png files to show the source HTML displayed in Firefox 3.6.15 and the epub as displayed by Calibre's built-in viewer.

Those screen shots come from a very stripped down HTML file and the actual style.css file for the book I was trying to convert. I'm attaching those files as well (camouflaged as .txt to make the Forum software happy.) Please let me know anything else that would be helpful.
Attached Thumbnails
Click image for larger version

Name:	from-HTML.png
Views:	122
Size:	37.9 KB
ID:	68068   Click image for larger version

Name:	from-Calibre-epub-viewer.png
Views:	127
Size:	97.0 KB
ID:	68069  
Attached Files
File Type: txt testdoc1.html.txt (1.2 KB, 126 views)
File Type: txt style.css.txt (1.9 KB, 81 views)
haven is offline   Reply With Quote
Old 03-10-2011, 12:41 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
nowrap makes no sense on an inline element, only on a block level element. Some html renderers will automatically convert an inline element with nowrap to a block element, others will ignore the nowrap. The former is what you see happening.
kovidgoyal is offline   Reply With Quote
 
Enthusiast
Old 03-10-2011, 12:44 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Oh and do not use upper case for tag and attribute names, that is so 1990
kovidgoyal is offline   Reply With Quote
Old 03-10-2011, 01:52 PM   #4
haven
Member
haven began at the beginning.
 
Posts: 10
Karma: 34
Join Date: Nov 2008
Device: Kindle 3, iPad, Iliad Irex
Quote:
Originally Posted by kovidgoyal View Post
nowrap makes no sense on an inline element, only on a block level element. Some html renderers will automatically convert an inline element with nowrap to a block element, others will ignore the nowrap. The former is what you see happening.
That makes sense to me intuitively. I'm inexperienced enough in this territory that I don't even know what standards apply. So http://www.w3.org/TR/CSS2/text.html#white-space-prop may be the wrong lookup, but if it does apply, w3.org defines white-space as applying to "all elements", no?

My example code fragment was a trimmed sanitized version of an online book from a well-regarded author and publisher. (Their upper case looked quaint to me too, but I left it verbatim in case it was relevant to debugging.) Now that I've found the issue, which pervades the book, I can fix it for my own purposes by an awk script or, worst case, writing my own parser. I thought it might be helpful to write it up since similar code blocks could affect others. I'm attaching a picture of the differential output before and after Calibre parses the html fragment I posted ... the purple highlighting shows where the Calibre parser gets confused by the assumption that white-space implies block level.

Thanks for the incredibly quick reply (and for your invaluable labor-of-love in creating Calibre. I'm humbled by your contribution and very appreciative.)
Attached Thumbnails
Click image for larger version

Name:	raw-vs-parsed-html-highlighted.png
Views:	87
Size:	77.6 KB
ID:	68087  
haven is offline   Reply With Quote
Old 03-10-2011, 10:08 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This doesn't have anything to do with the css, it's likely caused by the uppercase tags. I've added code to lowercase the tags automatically before parsing for the next release, that should solve the problem for you.
kovidgoyal is offline   Reply With Quote
Reply

Tags
calibre, css, epub, white-space

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
epub code snippets (html / css) zelda_pinwheel ePub 179 10-05-2012 06:37 PM
epub to mobi - Displays Html tags stevec1409 Conversion 7 02-14-2011 03:41 PM
Options to show programming source code in ePub pedgarcia ePub 2 07-21-2010 10:41 AM
Let's create a source code repository for DR 800 related code? jraf iRex 3 03-11-2010 12:26 PM
ebook-convert HTML to EPUB and problem with <pre><code> mikegr Calibre 2 03-09-2010 02:27 PM


All times are GMT -4. The time now is 10:30 AM.


MobileRead.com is a privately owned, operated and funded community.