You are welcome. Regexes can make the otherwise mindless task of tidying up a book conversion more interesting. Ok, not that much, but a little bit
There is a big mental checklist of stuff I go through with every epub I cleanup (not all using regex exclusively of course) including...
- Stripping any "faked" indenting with & replacing it with an indented justified style
- Ensuring all chapters are given a heading style
- Stripping out nested div tags and replacing divs with paragraphs
- Stripping out <span> tags that are unnecessary when the paragraph css style is set correctly.
- Recombining paragraphs that contain broken sentences
- Replacing incorrect or inadequate quotes around speech. For instance I don't like speech that is 'Some quote' (or worse, an inconsistent combination of " ` ' etc from a bad OCR conversion) and prefer to see “Some quote”
There are still circumstances you won't catch without manually eyeballing but you can fairly quickly turn a very badly formatted document into one that is considerably more pleasant to read.
You mentioned multi-line paragraphs - hopefully you saw you can cope with those in Sigil with my example above by just using \s+ (one or more spaces). You don't have to worry thinking about "newline" characters like \r or \n in Sigil, just use \s+ between the ending/opening tags and that will allow your expression to be matched multi-line.
One final point which is mentioned on a few other threads. You should tick the "Minimal Matching" checkbox on the Find/Replace dialog that is enabled when you choose regular expressions. In fact I haven't needed to uncheck it since finding out it's purpose so pretty much set and forget. It is the only way for certain expressions to work. For instance say your document looks like this with some pointless span tag pairs to remove:
<p class="calibre2"><span class="none">Blah blah text</span></p>
<p class="calibre2"><span class="none">More text</span></p>
Find: <span class="none">(.*)</span>
Replace: \1
This says Find *any* text within pairs of <span class="none"> and </span> tags and replace it with just the text, thereby removing the outer set of tags. This will only work "correctly" with "Minimal Matching" checkbox turned on.