06-26-2010, 05:17 PM | #1 |
Junior Member
Posts: 5
Karma: 12
Join Date: Jun 2010
Location: Houston, TX
Device: Kindle DX, iPhone
|
Import of HTML With Embedded <Style> Broken In 0.7.5
Using 0.7.5, HTML is corrupted when imported:
--- Start HTML --- <?xml version='1.0' encoding='utf-8'?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta/>Content-Type content="text/html; charset=windows-1252"> <meta/>Generator content="Microsoft Word 12 (filtered)"> <title>The XXXX of the XXXX</title> <style> <!-- SNIP (css looks fine) --> </style> <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/></head> <body/>EN-US link=blue vlink=purple> <div></div>WordSection1> <p/>MsoBodyText>“INTRIGUING. ... Mr. XXXX’s elaborate tale works so well. Imagination carries the day.”</html> --- END OF HTML --- Note that the imported file stops abruptly. I can take the identical source HTML in 0.7.4 and import it without problem. HTML's without a <style> sheet imports fine with 0.7.5. I'm using Windows XP SP 3. Oboe Joe --- Honk! |
06-27-2010, 05:28 AM | #2 |
Guru
Posts: 695
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
|
I just ran into the same problem. It looks like it's caused by attributes on tags with values that are not in quotes. For example, if you're converting rtf or doc to HTML using Word, classes are generally not quoted (class=MsoNormal rather than class="MsoNormal"). Apparently some change in HTML processing 0.7.5 no longer likes that. In my case, I was able to clean up the HTML easily enough, adding quotes around unquoted attributes.
Hopefully it's just a bug and not a new feature |
Advert | |
|
06-27-2010, 10:35 AM | #3 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Will be fixed in next release.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML -> EPUB with no embedded fonts | doremifaso | Calibre | 10 | 09-25-2010 05:56 AM |
Calibre can't import html exported by Acrobat? | greenapple | Calibre | 0 | 02-11-2010 12:37 AM |
importing html does not import images | reup | Calibre | 12 | 12-08-2009 08:52 PM |
pulling embedded TOC from HTML | JBNY | Calibre | 0 | 12-03-2009 05:05 PM |
Accented letters not detected on HTML import | HarryT | Sigil | 6 | 08-11-2009 08:53 AM |