![]() |
#1 |
Zealot
![]() Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Ignoring MS Office Smart Tags during convert?
I've got a huge html document which contains MS Office Smart Tags. i.e.
<st1:Street w:st="on"><st1:address w:st="on">University Place</st1:address></st1:Street> When Calibre converts this to epub the formatting goes a little wonky (University Place would appear on it's own line or something similar). Is there a way to get Calibre to ignore these tags all together? Thanks! |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,246
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
If you have MSWord, you could try the following :-
1. Open the HTML file with Word. 2. Then File Save-As type WebPage-Filtered. This should give you a new HTML file without the MS smart tags. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Thanks.
I've only got linux available (and no Word). I could write a regex and use sed to strip them out but since the files are passed to me already in a zip file, I bring them in directly to Calibre and convert to epub. I guess I'm kinda hoping that Calibre can handle this on it's own. Maybe I'm just being too lazy! ![]() |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can put regexps into the remove header and footer options to remove whatever you like.
|
![]() |
![]() |
![]() |
#5 |
Zealot
![]() Posts: 123
Karma: 76
Join Date: Feb 2009
Device: Sony PRS-505, PRS-350
|
Figured it out, works great!
Last edited by notsure; 12-15-2009 at 10:41 PM. Reason: d'oh! |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Automatically convert Foldernames to Tags in Calibre ? | funky | Calibre | 0 | 07-19-2010 06:36 AM |
ignoring pdfs | klikklak | Calibre | 2 | 09-03-2009 05:27 AM |
Government US Copyright Office: Report on Orphan Works. US Copyright Office. PDF | Nate the great | Other Books | 0 | 01-03-2008 07:16 PM |
Is Sony just simply ignoring the Reader? | WilliamG | Sony Reader | 47 | 03-15-2007 05:35 AM |