Hello.
I'm trying to find and replace elements in HTM documents from a decompiled CHM to make chapter headings in order to create a TOC. The unique identifiers for sub-chapters are as follows:
Code:
<div class="TLV1" id="B01306002.0-103" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[2]">
<div class="HD">
Taking a history
</div>
I used find:
<div class="TLV1"\s+(.*?)\s+ <div class="HD">\s+(.*?)\s+</div> and replace:
<div class="TLV1" \1<h2 class="HD">\1</h2> for these instances, but it's not perfect.
A few non-subchapter elements (box items) are also included if the above expression is used, for example they look like this:
Code:
<div class="SIDEBAR BOX">
<div class="TLV1" id="B01306002.0-167" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[7]/SIDEBAR[2]/TLV1[1]">
<div class="HD">
The jugular venous systems
</div><a id="T5-2"></a>
The regular part would be SIDEBAR (there's SIDEBAR LIST, etc.).
To add to the complexity, one more unique identifier for sub-chapters exist, to which the original search string I use cannot pick up:
Code:
<div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]">
<div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]">
On being busy: Corrigan's secret door
</div>
What's the suitable search string that includes both of what I want and ignore elements marked SIDEBAR?