Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-15-2009, 07:21 PM   #1
notsure
Zealot
notsure has learned how to buy an e-book online
 
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
Ignoring MS Office Smart Tags during convert?

I've got a huge html document which contains MS Office Smart Tags. i.e.

<st1:Street w:st="on"><st1:address w:st="on">University Place</st1:address></st1:Street>

When Calibre converts this to epub the formatting goes a little wonky (University Place would appear on it's own line or something similar).

Is there a way to get Calibre to ignore these tags all together?

Thanks!
notsure is offline   Reply With Quote
Old 12-15-2009, 08:11 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
If you have MSWord, you could try the following :-

1. Open the HTML file with Word.

2. Then File Save-As type WebPage-Filtered.

This should give you a new HTML file without the MS smart tags.
jackie_w is offline   Reply With Quote
Advert
Old 12-15-2009, 08:22 PM   #3
notsure
Zealot
notsure has learned how to buy an e-book online
 
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
Thanks.

I've only got linux available (and no Word).

I could write a regex and use sed to strip them out but since the files are passed to me already in a zip file, I bring them in directly to Calibre and convert to epub.

I guess I'm kinda hoping that Calibre can handle this on it's own. Maybe I'm just being too lazy!
notsure is offline   Reply With Quote
Old 12-15-2009, 08:31 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can put regexps into the remove header and footer options to remove whatever you like.
kovidgoyal is offline   Reply With Quote
Old 12-15-2009, 10:01 PM   #5
notsure
Zealot
notsure has learned how to buy an e-book online
 
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
Figured it out, works great!

Last edited by notsure; 12-15-2009 at 10:41 PM. Reason: d'oh!
notsure is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Automatically convert Foldernames to Tags in Calibre ? funky Calibre 0 07-19-2010 06:36 AM
ignoring pdfs klikklak Calibre 2 09-03-2009 05:27 AM
Government US Copyright Office: Report on Orphan Works. US Copyright Office. PDF Nate the great Other Books 0 01-03-2008 07:16 PM
Is Sony just simply ignoring the Reader? WilliamG Sony Reader 47 03-15-2007 05:35 AM


All times are GMT -4. The time now is 03:58 AM.


MobileRead.com is a privately owned, operated and funded community.