Quote:
The HTML code looks like:
Code:
<p class="calibre"><span>bad policy to answer a</span></p>
<p class="calibre"><span>direct question. He kept shaking his head like a china figure.
|
Ugh. Those empty spans surrounding literally
everything are always a pain in the ass. You'll almost surely need to get rid of them first. The problem is ... there can be nested spans (italics/bolds/etc) within them. And that makes it quite
painful to regex them away (without funkifying your "real" formatting spans).
If I have the original text to proof against, I sometimes find it easier (and less frustrating) just to blast
ALL the spans away. Every single one. And then redo any italic and/or other special formatting using the physical copy as a guide. It's drastic, yes, but sometimes it's less drastic than fixing the havoc that a regex run on nested spans can wreak.
In one fell swoop, all span tags (opening and closing) ... gone (when you replace it with
nothing of course):
It all depends on the complexity of the book's formatting, of course. I may not
always opt for the "nuclear" span removal approach, but I've done it quite a few times.
Use with an appropriate level of trepidation, of course...