| 
			
			 | 
		#1 | 
| 
			
			
			
			 Constant Reader 
			
			![]() Posts: 12 
				Karma: 10 
				Join Date: Apr 2012 
				
				
				
				Device: HTC Sensation 4G (Kindle software) 
				
				
				 | 
	
	
	
		
		
			
			 
			
			I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like: <h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1> However, in the HTML file in the Debug\Parsed folder, the heading has been changed to: <h1><a name="<i>Toc244191671”></a>New York Bumper Stickers</h1> As a result: 
 In other cases, a very similar heading changes from: <h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know you’re from </a>Jersey when …</h1> to: <h1 style="margin-top:1em;margin-bottom:1em;"><a name="<i>Toc244191667”></a><a name=">Toc104266959”>You know you’re from </a>Jersey when …</h1> In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted. Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text: <i> being inserted before the text of the anchor name attribute?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,782 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			My suggestion is to just delete the <a name="_Toc244191671"> type code in the headers. You don't actually need it.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#3 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613 
				Karma: 6718541 
				Join Date: Dec 2004 
				Location: Paradise (Key West, FL) 
				
				
				Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ... 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 One fix is to sweep the document with S&R to remove the underscore (e.g. replace "_Toc" with "Toc") before conversion.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Turn off heuristics in the conversion settings.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,782 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| Advert | |
| 
         | 
    
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 Constant Reader 
			
			![]() Posts: 12 
				Karma: 10 
				Join Date: Apr 2012 
				
				
				
				Device: HTC Sensation 4G (Kindle software) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I knew there had to be a simple answer. Turning off Heuristics solved the problem -- without any editing. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Thank you, Kovid!  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,782 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			But do you really want to leave in code you don't actually need?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
| Tags | 
| corruption, html, rtf | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Chapter Headings | Paxman53 | Conversion | 3 | 10-12-2011 01:31 PM | 
| Why H1 and H2 Chapter Headings? | Ransom | Calibre | 11 | 08-10-2011 05:29 PM | 
| Shortcut for Headings? | elmago79 | Sigil | 1 | 07-04-2011 08:48 PM | 
| Nested headings? | crich70 | Sigil | 20 | 04-11-2011 11:44 AM | 
| Different font for headings | bremler | ePub | 4 | 03-11-2010 07:03 AM |