04-19-2012, 09:55 PM | #1 |
Constant Reader
Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
|
Headings going awry
I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors.
The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like: <h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1> However, in the HTML file in the Debug\Parsed folder, the heading has been changed to: <h1><a name="<i>Toc244191671”></a>New York Bumper Stickers</h1> As a result:
In other cases, a very similar heading changes from: <h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know you’re from </a>Jersey when …</h1> to: <h1 style="margin-top:1em;margin-bottom:1em;"><a name="<i>Toc244191667”></a><a name=">Toc104266959”>You know you’re from </a>Jersey when …</h1> In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted. Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text: <i> being inserted before the text of the anchor name attribute? |
04-19-2012, 10:29 PM | #2 |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
My suggestion is to just delete the <a name="_Toc244191671"> type code in the headers. You don't actually need it.
|
Advert | |
|
04-20-2012, 09:08 AM | #3 | |
Wizard
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
One fix is to sweep the document with S&R to remove the underscore (e.g. replace "_Toc" with "Toc") before conversion. |
|
04-20-2012, 10:14 AM | #4 |
creator of calibre
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Turn off heuristics in the conversion settings.
|
04-20-2012, 10:23 AM | #5 |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
|
Advert | |
|
04-20-2012, 10:37 AM | #6 |
Well trained by Cats
Posts: 30,371
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
04-20-2012, 12:45 PM | #7 |
Constant Reader
Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
|
I knew there had to be a simple answer. Turning off Heuristics solved the problem -- without any editing.
Thank you, Kovid! |
04-26-2012, 04:19 PM | #8 |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But do you really want to leave in code you don't actually need?
|
Tags |
corruption, html, rtf |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Chapter Headings | Paxman53 | Conversion | 3 | 10-12-2011 12:31 PM |
Why H1 and H2 Chapter Headings? | Ransom | Calibre | 11 | 08-10-2011 04:29 PM |
Shortcut for Headings? | elmago79 | Sigil | 1 | 07-04-2011 07:48 PM |
Nested headings? | crich70 | Sigil | 20 | 04-11-2011 10:44 AM |
Different font for headings | bremler | ePub | 4 | 03-11-2010 06:03 AM |