Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-19-2012, 09:55 PM   #1
MikeMJ
Constant Reader
MikeMJ began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
Question Headings going awry

I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors.

The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like:

<h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1>

However, in the HTML file in the Debug\Parsed folder, the heading has been changed to:

<h1><a name="&lt;i&gt;Toc244191671”&gt;&lt;/a&gt;New York Bumper Stickers&lt;/h1&gt;

As a result:
  • The paragraphs that follow the <h1> are appearing in the large font of the <h1>
  • The reference from the <a> is appearing as text
  • The text of the <h1> is not visible

In other cases, a very similar heading changes from:

<h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know you’re from </a>Jersey when …</h1>

to:

<h1 style="margin-top:1em;margin-bottom:1em;"><a name="&lt;i&gt;Toc244191667”&gt;&lt;/a&gt;&lt;a name=">Toc104266959”&gt;You know you’re from </a>Jersey when …</h1>

In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted.

Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:

&lt;i&gt;

being inserted before the text of the anchor name attribute?
MikeMJ is offline   Reply With Quote
Old 04-19-2012, 10:29 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,661
Karma: 127838198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
My suggestion is to just delete the <a name="_Toc244191671"> type code in the headers. You don't actually need it.
JSWolf is online now   Reply With Quote
Old 04-20-2012, 09:08 AM   #3
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by MikeMJ View Post
I am trying to...
"_Toc...

the heading has been changed to:
"&lt;i&gt;Toc...

Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:

"&lt;i&gt;

being inserted before the text of the anchor name attribute?
Quite obviously, the underscore preceding the "Toc" is giving calibre grief. Calibre seems to be incorrectly interpreting it as the antique (still frequently seen) use of underscores wrapping a portion of text to indicate italics. Calibre is replacing the underscore with the <i> tag. Later processing is then converting the < to &lt and > to &gt.

One fix is to sweep the document with S&R to remove the underscore (e.g. replace "_Toc" with "Toc") before conversion.
dwig is offline   Reply With Quote
Old 04-20-2012, 10:14 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Turn off heuristics in the conversion settings.
kovidgoyal is offline   Reply With Quote
Old 04-20-2012, 10:23 AM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,661
Karma: 127838198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
JSWolf is online now   Reply With Quote
Old 04-20-2012, 10:37 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by JSWolf View Post
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
I believe you need Anchors

IF

You don't start a new file at chapter boundries

By default, TOF is the anchor point.
theducks is online now   Reply With Quote
Old 04-20-2012, 12:45 PM   #7
MikeMJ
Constant Reader
MikeMJ began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
I knew there had to be a simple answer. Turning off Heuristics solved the problem -- without any editing.

Thank you, Kovid!
MikeMJ is offline   Reply With Quote
Old 04-26-2012, 04:19 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,661
Karma: 127838198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
But do you really want to leave in code you don't actually need?
JSWolf is online now   Reply With Quote
Reply

Tags
corruption, html, rtf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Chapter Headings Paxman53 Conversion 3 10-12-2011 12:31 PM
Why H1 and H2 Chapter Headings? Ransom Calibre 11 08-10-2011 04:29 PM
Shortcut for Headings? elmago79 Sigil 1 07-04-2011 07:48 PM
Nested headings? crich70 Sigil 20 04-11-2011 10:44 AM
Different font for headings bremler ePub 4 03-11-2010 06:03 AM


All times are GMT -4. The time now is 11:40 AM.


MobileRead.com is a privately owned, operated and funded community.