Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-01-2010, 06:22 PM   #1
Daddy Warpig
Enthusiast
Daddy Warpig began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Apr 2010
Device: iPad & iPhone
ePub Output Bug, Caused by MSWord

There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Daddy Warpig is offline   Reply With Quote
Old 06-01-2010, 07:05 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,059
Karma: 5936659
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Daddy Warpig View Post
There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Yup!
I noticed that street names seem to get broken up instead of just Italicized. Figured Kovid liked it that way
theducks is offline   Reply With Quote
 
Advertisement
Old 06-01-2010, 11:41 PM   #3
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,877
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Hi Daddy Warpigs,

If you generate your HTML using MSWord, you should use the SaveAs Webpage-Filtered option rather than SaveAs Webpage. The "smart tags" should then not be created in your generated HTML and there is no need for manual editing.
jackie_w is offline   Reply With Quote
Old 06-02-2010, 10:03 AM   #4
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,120
Karma: 8671315
Join Date: Dec 2006
Location: Singapore
Device: Coolreader(Nexus 5)\Coolreader(Nook Touch)
I would also recommend passing the html file through html tidy.
That cleans up many of the crap word add to the file.
I have seen files go down from 1 mb to about 500kb sometimes
Dopedangel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Epub Output: Line Height greenapple Conversion 20 01-27-2013 10:27 AM
EPUB output kovidgoyal Calibre 920 02-05-2011 12:59 PM
EPUB output justification toki08 Calibre 10 01-08-2011 05:14 PM
Seems Amazon have caused an epub price war in the UK ceebee_uk General Discussions 11 09-27-2010 05:20 AM
epub output metadata troymc Calibre 5 05-22-2010 01:23 AM


All times are GMT -4. The time now is 05:42 PM.


MobileRead.com is a privately owned, operated and funded community.