@Toxaris - I agree that when it works Tidy does some nice things that save work when doing regexes.
*However* what I suspect many people do not know is that Tidy *will* destroy html content in some circumstances. Or at least that is what user_none is pointing to as the cause when I reported this (and rather more concerning is that it has a status of "CannotFix" and the suggestion is to turn off Tidy as the workaround).
So if your document contains some smart tag remnants such as I have seen from LIT conversions like this:
<span w:st="on"><span w:st="on">Wash</span></span>
Then *all* content on the html page will be removed from that point onwards. Utterly destroyed. Can be entire chapters worth (which you might notice) or subtly small amounts which you likely won't.
Tidy either should be off by default, fixed to address this issue, or removed from Sigil in my opinion. This is just way too dangerous a flaw - edit a book in Sigil thinking you are making a small change and end up losing book content forever...
|