|
|
#1 |
|
Constant Reader
![]() Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
|
I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors.
The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like: <h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1> However, in the HTML file in the Debug\Parsed folder, the heading has been changed to: <h1><a name="<i>Toc244191671”></a>New York Bumper Stickers</h1> As a result:
In other cases, a very similar heading changes from: <h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know you’re from </a>Jersey when …</h1> to: <h1 style="margin-top:1em;margin-bottom:1em;"><a name="<i>Toc244191667”></a><a name=">Toc104266959”>You know you’re from </a>Jersey when …</h1> In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted. Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text: <i> being inserted before the text of the anchor name attribute? |
|
|
|
|
|
#2 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,778
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
My suggestion is to just delete the <a name="_Toc244191671"> type code in the headers. You don't actually need it.
|
|
|
|
|
|
#3 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
One fix is to sweep the document with S&R to remove the underscore (e.g. replace "_Toc" with "Toc") before conversion. |
|
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Turn off heuristics in the conversion settings.
|
|
|
|
|
|
#5 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,778
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
|
|
|
|
|
|
#6 |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,266
Karma: 61916422
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
|
|
|
|
|
#7 |
|
Constant Reader
![]() Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
|
I knew there had to be a simple answer. Turning off Heuristics solved the problem -- without any editing.
Thank you, Kovid! |
|
|
|
|
|
#8 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,778
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But do you really want to leave in code you don't actually need?
|
|
|
|
![]() |
| Tags |
| corruption, html, rtf |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Chapter Headings | Paxman53 | Conversion | 3 | 10-12-2011 01:31 PM |
| Why H1 and H2 Chapter Headings? | Ransom | Calibre | 11 | 08-10-2011 05:29 PM |
| Shortcut for Headings? | elmago79 | Sigil | 1 | 07-04-2011 08:48 PM |
| Nested headings? | crich70 | Sigil | 20 | 04-11-2011 11:44 AM |
| Different font for headings | bremler | ePub | 4 | 03-11-2010 07:03 AM |