The thing about etree is that you don't pass up strings like '<span class=...'. Tags are objects and text nodes are just a property of the object. Take this XML fragment for example:
Code:
<p class="test">This is a <strong>text node</strong> of a paragraph.</p>
This in etree would be an object for the paragraph node with the following properties:
- Attribute name 'class' set to 'test'
- String attribute 'text' set to 'This is a ' (technically not a string, but you can treat it as a string)
- Child node object for the <strong> with the following properties:
- String attribute 'text' set to 'text node' (again, not technically a string...)
- String attribute 'tail' set to ' of a paragraph.' (like 'text', not technically a string)
So, my thought for fixing this is that anywhere I would nest a span tag, I instead end processing the current span and add the new text node as part of the list of child nodes of the parent. Currently, the above fragment would evaluate to:
Code:
<p class="test"><span class="koboSpan" id="kobo.1.3">This is a <strong><span class="koboSpan" id="kobo.1.1">text node</span></strong><span class="koboSpan" id="kobo.1.2"> of a paragraph.</span></p>
What I hope it will become is:
Code:
<p class="test"><span class="koboSpan" id="kobo.1.1">This is a </span><strong><span class="koboSpan" id="kobo.1.2">text node</span></strong><span class="koboSpan" id="kobo.1.3"> of a paragraph.</span></p>