For completeness here's an example using a HTML/XML parser in my programming language of choice R. I put one of the HTML snippets from this thread in a file called "test.html".
Code:
##install xml2 package and load it
install.packages("xml2")
library(xml2)
##read in the HTML file
arf = read_html("~/test.html",options="RECOVER")
##find all span nodes using xpath selectors
spans <- xml_find_all(arf,"//span")
##Replace them with just their text contents
xml_replace(spans,xml_contents(spans))
##Write out the file
write_html(arf,"~/testOut.html")
This nets us an HTML file that looks like:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body id="xx" lang="en-US" style="width:396px;height:612px" xml:lang="en-US">
<div class="Basic-Text-Frame" id="_idContainer250">
<div style="width:5760px;height:9540px;position:absolute;top:0px;left:0px;-webkit-transform-origin: 0% 0%; -webkit-transform: translate(0px,5.83px) rotate(0deg) scale(0.05);transform-origin: 0% 0%; transform: translate(0px,5.83px) rotate(0deg) scale(0.05);">
<p class="Chapter-Title ParaOverride-1">Time to Forgive</p>
<p class="Drop-Cap ParaOverride-1">“I want you to imagine your reflection in a beautiful mirror—the person who caused
</p>
</div>
</div>
</body></html>
It's way easier to edit HTML/XML with a parser than regex.
EDIT: I Think R mangled the em dashes and quotes with it's crappy text support though.