MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Calibre (https://www.mobileread.com/forums/forumdisplay.php?f=166)
-   -   Import of HTML With Embedded <Style> Broken In 0.7.5 (https://www.mobileread.com/forums/showthread.php?t=88483)

Oboe Joe 06-26-2010 06:17 PM

Import of HTML With Embedded <Style> Broken In 0.7.5
 
Using 0.7.5, HTML is corrupted when imported:

--- Start HTML ---
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<meta/>Content-Type content="text/html; charset=windows-1252"&gt;
<meta/>Generator content="Microsoft Word 12 (filtered)"&gt;
<title>The XXXX of the XXXX</title>
<style>
<!--
SNIP (css looks fine)
-->
</style>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/></head>

<body/>EN-US link=blue vlink=purple&gt;

<div></div>WordSection1&gt;

<p/>MsoBodyText&gt;“INTRIGUING. ... Mr. XXXX’s elaborate tale works so well.
Imagination carries the day.”</html>
--- END OF HTML ---

Note that the imported file stops abruptly.

I can take the identical source HTML in 0.7.4 and import it without problem. HTML's without a <style> sheet imports fine with 0.7.5.

I'm using Windows XP SP 3.

Oboe Joe
---
Honk!

toddos 06-27-2010 06:28 AM

I just ran into the same problem. It looks like it's caused by attributes on tags with values that are not in quotes. For example, if you're converting rtf or doc to HTML using Word, classes are generally not quoted (class=MsoNormal rather than class="MsoNormal"). Apparently some change in HTML processing 0.7.5 no longer likes that. In my case, I was able to clean up the HTML easily enough, adding quotes around unquoted attributes.

Hopefully it's just a bug and not a new feature :)

kovidgoyal 06-27-2010 11:35 AM

Will be fixed in next release.


All times are GMT -4. The time now is 06:35 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.