The HTML code looks like:
<p class="calibre"><span>bad policy to answer a</span></p>
<p class="calibre"><span>direct question. He kept shaking his head like a china figure.
Ugh. Those empty spans surrounding literally everything
are always a pain in the ass. You'll almost surely need to get rid of them first. The problem is ... there can be nested spans (italics/bolds/etc) within them. And that makes it quite painful
to regex them away (without funkifying your "real" formatting spans).
If I have the original text to proof against, I sometimes find it easier (and less frustrating) just to blast ALL
the spans away. Every single one. And then redo any italic and/or other special formatting using the physical copy as a guide. It's drastic, yes, but sometimes it's less drastic than fixing the havoc that a regex run on nested spans can wreak.
In one fell swoop, all span tags (opening and closing) ... gone (when you replace it with nothing
It all depends on the complexity of the book's formatting, of course. I may not always
opt for the "nuclear" span removal approach, but I've done it quite a few times.
Use with an appropriate level of trepidation, of course...