12-15-2009, 07:21 PM | #1 |
Zealot
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Ignoring MS Office Smart Tags during convert?
I've got a huge html document which contains MS Office Smart Tags. i.e.
<st1:Street w:st="on"><st1:address w:st="on">University Place</st1:address></st1:Street> When Calibre converts this to epub the formatting goes a little wonky (University Place would appear on it's own line or something similar). Is there a way to get Calibre to ignore these tags all together? Thanks! |
12-15-2009, 08:11 PM | #2 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
If you have MSWord, you could try the following :-
1. Open the HTML file with Word. 2. Then File Save-As type WebPage-Filtered. This should give you a new HTML file without the MS smart tags. |
12-15-2009, 08:22 PM | #3 |
Zealot
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Thanks.
I've only got linux available (and no Word). I could write a regex and use sed to strip them out but since the files are passed to me already in a zip file, I bring them in directly to Calibre and convert to epub. I guess I'm kinda hoping that Calibre can handle this on it's own. Maybe I'm just being too lazy! |
12-15-2009, 08:31 PM | #4 |
creator of calibre
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can put regexps into the remove header and footer options to remove whatever you like.
|
12-15-2009, 10:01 PM | #5 |
Zealot
Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Figured it out, works great!
Last edited by notsure; 12-15-2009 at 10:41 PM. Reason: d'oh! |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Automatically convert Foldernames to Tags in Calibre ? | funky | Calibre | 0 | 07-19-2010 06:36 AM |
ignoring pdfs | klikklak | Calibre | 2 | 09-03-2009 05:27 AM |
Government US Copyright Office: Report on Orphan Works. US Copyright Office. PDF | Nate the great | Other Books | 0 | 01-03-2008 07:16 PM |
Is Sony just simply ignoring the Reader? | WilliamG | Sony Reader | 47 | 03-15-2007 05:35 AM |