Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-19-2012, 10:55 PM   #1
MikeMJ
Constant Reader
MikeMJ began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
Question Headings going awry

I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors.

The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like:

<h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1>

However, in the HTML file in the Debug\Parsed folder, the heading has been changed to:

<h1><a name="&lt;i&gt;Toc244191671”&gt;&lt;/a&gt;New York Bumper Stickers&lt;/h1&gt;

As a result:
  • The paragraphs that follow the <h1> are appearing in the large font of the <h1>
  • The reference from the <a> is appearing as text
  • The text of the <h1> is not visible

In other cases, a very similar heading changes from:

<h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know you’re from </a>Jersey when …</h1>

to:

<h1 style="margin-top:1em;margin-bottom:1em;"><a name="&lt;i&gt;Toc244191667”&gt;&lt;/a&gt;&lt;a name=">Toc104266959”&gt;You know you’re from </a>Jersey when …</h1>

In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted.

Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:

&lt;i&gt;

being inserted before the text of the anchor name attribute?
MikeMJ is offline   Reply With Quote
Old 04-19-2012, 11:29 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 38,507
Karma: 19637653
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2), Sony PRS-650, Sony PRS-T1, nook STR, iPad 1, iPhone 5
My suggestion is to just delete the <a name="_Toc244191671"> type code in the headers. You don't actually need it.
JSWolf is offline   Reply With Quote
 
Advertisement
Old 04-20-2012, 10:08 AM   #3
dwig
Guru
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 996
Karma: 1843322
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Dell Venue 8 Pro - Retired:Kindle 3, Clie UX50, T415, ...
Quote:
Originally Posted by MikeMJ View Post
I am trying to...
"_Toc...

the heading has been changed to:
"&lt;i&gt;Toc...

Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:

"&lt;i&gt;

being inserted before the text of the anchor name attribute?
Quite obviously, the underscore preceding the "Toc" is giving calibre grief. Calibre seems to be incorrectly interpreting it as the antique (still frequently seen) use of underscores wrapping a portion of text to indicate italics. Calibre is replacing the underscore with the <i> tag. Later processing is then converting the < to &lt and > to &gt.

One fix is to sweep the document with S&R to remove the underscore (e.g. replace "_Toc" with "Toc") before conversion.
dwig is offline   Reply With Quote
Old 04-20-2012, 11:14 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,435
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Turn off heuristics in the conversion settings.
kovidgoyal is offline   Reply With Quote
Old 04-20-2012, 11:23 AM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 38,507
Karma: 19637653
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2), Sony PRS-650, Sony PRS-T1, nook STR, iPad 1, iPhone 5
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
JSWolf is offline   Reply With Quote
Old 04-20-2012, 11:37 AM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,256
Karma: 6020307
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by JSWolf View Post
I do prefer my solution to actually remove the <a> from the code. It's not needed and just adds bloat.
I believe you need Anchors

IF

You don't start a new file at chapter boundries

By default, TOF is the anchor point.
theducks is online now   Reply With Quote
Old 04-20-2012, 01:45 PM   #7
MikeMJ
Constant Reader
MikeMJ began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2012
Device: HTC Sensation 4G (Kindle software)
I knew there had to be a simple answer. Turning off Heuristics solved the problem -- without any editing.

Thank you, Kovid!
MikeMJ is offline   Reply With Quote
Old 04-26-2012, 05:19 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 38,507
Karma: 19637653
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2), Sony PRS-650, Sony PRS-T1, nook STR, iPad 1, iPhone 5
But do you really want to leave in code you don't actually need?
JSWolf is offline   Reply With Quote
Reply

Tags
corruption, html, rtf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Chapter Headings Paxman53 Conversion 3 10-12-2011 01:31 PM
Why H1 and H2 Chapter Headings? Ransom Calibre 11 08-10-2011 05:29 PM
Shortcut for Headings? elmago79 Sigil 1 07-04-2011 08:48 PM
Nested headings? crich70 Sigil 20 04-11-2011 11:44 AM
Different font for headings bremler ePub 4 03-11-2010 07:03 AM


All times are GMT -4. The time now is 03:52 PM.


MobileRead.com is a privately owned, operated and funded community.