I've somewhat solved the problem for the second part of finding the unique subchapters that have codes like:
Code:
<div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]">
<div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]">
On being busy: Corrigan's secret door
</div>
With find:
<div class="TLV1"\s+(.*?)\s+<div class="HD"(\s+(.*?)\s+)(\s+(.*?)\s+)</div>
And replace:
<div class="TLV1" \1<h2 class="HD"\2\4</h2>
But now I pick up SIDEBAR elements as well. So whatever search string that would ignore the word SIDEBAR should work with both.