Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-01-2010, 05:22 PM   #1
Daddy Warpig
Enthusiast
Daddy Warpig began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Apr 2010
Device: iPad & iPhone
ePub Output Bug, Caused by MSWord

There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Daddy Warpig is offline   Reply With Quote
Old 06-01-2010, 06:05 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,707
Karma: 5643657
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Daddy Warpig View Post
There is an annoying bug in the Calibre ePub conversion module, linked to a "feature" of MSWord.

This original text:

Code:
to Unseelie Court on King Street and tease
is converted to the following text:

Code:
to
Unseelie Court
on

King Street
and tease
Cause:

MS Word Generated HTML/XHTML includes "smart tags." When such an HTML file is converted to ePub, these tags are translated, but errant <p> tags are inserted into the new html.

Original HTML code:

Code:
to <st1:Street w:st="on"><st1:address
 w:st="on">Unseelie Court</st1:address></st1:Street> on <st1:Street w:st="on"><st1:address
 w:st="on">King Street</st1:address></st1:Street> and tease
Translated HTML code:

Code:
to</p>
<address class="calibre8"><span>Unseelie</span> Court</address>
<p>on</p>
<address class="calibre8">King Street</address>
<p>and tease
Some solutions for end users:

Either erase the MSWord smart tags before converting, or fix the <p> tags by hand after converting (unzip ePub, edit .html or .xhtml files, rezip).

This has been reported as ticket #5671 in the Calibre Bug Tracking system.
Yup!
I noticed that street names seem to get broken up instead of just Italicized. Figured Kovid liked it that way
theducks is offline   Reply With Quote
Old 06-01-2010, 10:41 PM   #3
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,799
Karma: 4193095
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Hi Daddy Warpigs,

If you generate your HTML using MSWord, you should use the SaveAs Webpage-Filtered option rather than SaveAs Webpage. The "smart tags" should then not be created in your generated HTML and there is no need for manual editing.
jackie_w is offline   Reply With Quote
Old 06-02-2010, 09:03 AM   #4
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,105
Karma: 8671315
Join Date: Dec 2006
Location: Singapore
Device: Coolreader(Nexus 5)\Coolreader(Nook Touch)
I would also recommend passing the html file through html tidy.
That cleans up many of the crap word add to the file.
I have seen files go down from 1 mb to about 500kb sometimes
Dopedangel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Epub Output: Line Height greenapple Conversion 20 01-27-2013 09:27 AM
EPUB output kovidgoyal Calibre 920 02-05-2011 11:59 AM
EPUB output justification toki08 Calibre 10 01-08-2011 04:14 PM
Seems Amazon have caused an epub price war in the UK ceebee_uk General Discussions 11 09-27-2010 04:20 AM
epub output metadata troymc Calibre 5 05-22-2010 12:23 AM


All times are GMT -4. The time now is 09:03 AM.


MobileRead.com is a privately owned, operated and funded community.