Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-26-2010, 06:17 PM   #1
Oboe Joe
Junior Member
Oboe Joe began at the beginning.
 
Posts: 5
Karma: 12
Join Date: Jun 2010
Location: Houston, TX
Device: Kindle DX, iPhone
Import of HTML With Embedded <Style> Broken In 0.7.5

Using 0.7.5, HTML is corrupted when imported:

--- Start HTML ---
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<meta/>Content-Type content="text/html; charset=windows-1252"&gt;
<meta/>Generator content="Microsoft Word 12 (filtered)"&gt;
<title>The XXXX of the XXXX</title>
<style>
<!--
SNIP (css looks fine)
-->
</style>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/></head>

<body/>EN-US link=blue vlink=purple&gt;

<div></div>WordSection1&gt;

<p/>MsoBodyText&gt;“INTRIGUING. ... Mr. XXXX’s elaborate tale works so well.
Imagination carries the day.”</html>
--- END OF HTML ---

Note that the imported file stops abruptly.

I can take the identical source HTML in 0.7.4 and import it without problem. HTML's without a <style> sheet imports fine with 0.7.5.

I'm using Windows XP SP 3.

Oboe Joe
---
Honk!
Oboe Joe is offline   Reply With Quote
Old 06-27-2010, 06:28 AM   #2
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 694
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
I just ran into the same problem. It looks like it's caused by attributes on tags with values that are not in quotes. For example, if you're converting rtf or doc to HTML using Word, classes are generally not quoted (class=MsoNormal rather than class="MsoNormal"). Apparently some change in HTML processing 0.7.5 no longer likes that. In my case, I was able to clean up the HTML easily enough, adding quotes around unquoted attributes.

Hopefully it's just a bug and not a new feature
toddos is offline   Reply With Quote
Old 06-27-2010, 11:35 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Will be fixed in next release.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML -> EPUB with no embedded fonts doremifaso Calibre 10 09-25-2010 06:56 AM
Calibre can't import html exported by Acrobat? greenapple Calibre 0 02-11-2010 01:37 AM
importing html does not import images reup Calibre 12 12-08-2009 09:52 PM
pulling embedded TOC from HTML JBNY Calibre 0 12-03-2009 06:05 PM
Accented letters not detected on HTML import HarryT Sigil 6 08-11-2009 09:53 AM


All times are GMT -4. The time now is 08:47 PM.


MobileRead.com is a privately owned, operated and funded community.