Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-01-2010, 05:22 PM   #1
Daddy Warpig
Enthusiast
Daddy Warpig began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Apr 2010
Device: iPad & iPhone
ePub Output Bug, Caused by MSWord

There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Daddy Warpig is offline   Reply With Quote
Old 06-01-2010, 06:05 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Daddy Warpig View Post
There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Yup!
I noticed that street names seem to get broken up instead of just Italicized. Figured Kovid liked it that way
theducks is offline   Reply With Quote
Advert
Old 06-01-2010, 10:41 PM   #3
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Hi Daddy Warpigs,

If you generate your HTML using MSWord, you should use the SaveAs Webpage-Filtered option rather than SaveAs Webpage. The "smart tags" should then not be created in your generated HTML and there is no need for manual editing.
jackie_w is offline   Reply With Quote
Old 06-02-2010, 09:03 AM   #4
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,759
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
I would also recommend passing the html file through html tidy.
That cleans up many of the crap word add to the file.
I have seen files go down from 1 mb to about 500kb sometimes
Dopedangel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Epub Output: Line Height greenapple Conversion 20 01-27-2013 09:27 AM
EPUB output kovidgoyal Calibre 920 02-05-2011 11:59 AM
EPUB output justification toki08 Calibre 10 01-08-2011 04:14 PM
Seems Amazon have caused an epub price war in the UK ceebee_uk General Discussions 11 09-27-2010 04:20 AM
epub output metadata troymc Calibre 5 05-22-2010 12:23 AM


All times are GMT -4. The time now is 07:28 AM.


MobileRead.com is a privately owned, operated and funded community.