MobileRead Forums - View Single Post - Indesign-epub-kindle formatting problem: footnotes export with massive indent.

salamanderjuice · 05-05-2021, 03:09 PM

For completeness here's an example using a HTML/XML parser in my programming language of choice R. I put one of the HTML snippets from this thread in a file called "test.html".

Code:

##install xml2 package and load it
install.packages("xml2")
library(xml2)

##read in the HTML file
arf = read_html("~/test.html",options="RECOVER")
##find all span nodes using xpath selectors
spans <- xml_find_all(arf,"//span")
##Replace them with just their text contents 
xml_replace(spans,xml_contents(spans))

##Write out the file
write_html(arf,"~/testOut.html")

This nets us an HTML file that looks like:

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body id="xx" lang="en-US" style="width:396px;height:612px" xml:lang="en-US">
		<div class="Basic-Text-Frame" id="_idContainer250">
			<div style="width:5760px;height:9540px;position:absolute;top:0px;left:0px;-webkit-transform-origin: 0% 0%; -webkit-transform: translate(0px,5.83px) rotate(0deg) scale(0.05);transform-origin: 0% 0%; transform: translate(0px,5.83px) rotate(0deg) scale(0.05);">
				<p class="Chapter-Title ParaOverride-1">Time to Forgive</p>
<p class="Drop-Cap ParaOverride-1">â€œI want you to imagine your reflection in a beautiful mirrorâ€”the person who caused 
</p>
</div>
</div>
</body></html>

It's way easier to edit HTML/XML with a parser than regex.

EDIT: I Think R mangled the em dashes and quotes with it's crappy text support though.