Thread: Regex examples
View Single Post
Old 09-27-2012, 08:35 PM   #16
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,418
Karma: 43257592
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
The HTML code looks like:

Code:
<p class="calibre"><span>bad policy to answer a</span></p>

<p class="calibre"><span>direct question. He kept shaking his head like a china figure.
Ugh. Those empty spans surrounding literally everything are always a pain in the ass. You'll almost surely need to get rid of them first. The problem is ... there can be nested spans (italics/bolds/etc) within them. And that makes it quite painful to regex them away (without funkifying your "real" formatting spans).

If I have the original text to proof against, I sometimes find it easier (and less frustrating) just to blast ALL the spans away. Every single one. And then redo any italic and/or other special formatting using the physical copy as a guide. It's drastic, yes, but sometimes it's less drastic than fixing the havoc that a regex run on nested spans can wreak.

In one fell swoop, all span tags (opening and closing) ... gone (when you replace it with nothing of course):
Code:
</?span[^>]*?>
It all depends on the complexity of the book's formatting, of course. I may not always opt for the "nuclear" span removal approach, but I've done it quite a few times.

Use with an appropriate level of trepidation, of course...

Last edited by DiapDealer; 09-27-2012 at 09:44 PM.
DiapDealer is offline   Reply With Quote