12-29-2012, 01:14 AM | #1 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2012
Location: KL, Malaysia
Device: Freda (WP 7.8) EPUB reader app
|
Need help for a regex
Hello.
I'm trying to find and replace elements in HTM documents from a decompiled CHM to make chapter headings in order to create a TOC. The unique identifiers for sub-chapters are as follows: Code:
<div class="TLV1" id="B01306002.0-103" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[2]"> <div class="HD"> Taking a history </div> A few non-subchapter elements (box items) are also included if the above expression is used, for example they look like this: Code:
<div class="SIDEBAR BOX"> <div class="TLV1" id="B01306002.0-167" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[7]/SIDEBAR[2]/TLV1[1]"> <div class="HD"> The jugular venous systems </div><a id="T5-2"></a> To add to the complexity, one more unique identifier for sub-chapters exist, to which the original search string I use cannot pick up: Code:
<div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]"> <div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]"> On being busy: Corrigan's secret door </div> What's the suitable search string that includes both of what I want and ignore elements marked SIDEBAR? |
12-29-2012, 01:52 AM | #2 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2012
Location: KL, Malaysia
Device: Freda (WP 7.8) EPUB reader app
|
|
12-29-2012, 02:26 AM | #3 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2012
Location: KL, Malaysia
Device: Freda (WP 7.8) EPUB reader app
|
I've somewhat solved the problem for the second part of finding the unique subchapters that have codes like:
Code:
<div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]"> <div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]"> On being busy: Corrigan's secret door </div> <div class="TLV1"\s+(.*?)\s+<div class="HD"(\s+(.*?)\s+)(\s+(.*?)\s+)</div> And replace: <div class="TLV1" \1<h2 class="HD"\2\4</h2> But now I pick up SIDEBAR elements as well. So whatever search string that would ignore the word SIDEBAR should work with both. |
12-31-2012, 11:25 PM | #4 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
I did one of these conversions a while back - I'm not sure why you're trying to preserve the sidebar, it's not going to render correctly on most readers anyway. Strip it all out, rather just use a real ToC.
|
01-02-2013, 04:42 AM | #5 |
Member
Posts: 15
Karma: 10
Join Date: Dec 2012
Location: KL, Malaysia
Device: Freda (WP 7.8) EPUB reader app
|
Thanks for the tip.
I'm just preserving the code as I'm not too sure what they're for, though it could be because there's instances of JavaScript use particularly for the inline CHM TOC (that didn't work anyway in the CHM for some reason). However for my particular case, I decided to just place all SIDEBAR elements as h3, therefore making them sub-elements to the sub-chapters, and allowing me to use the regex I've found that already work, rather than needing to differentiate them all with an all-encompassing regex. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RegEx Help | ghostyjack | Workshop | 4 | 03-22-2012 09:24 AM |
Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 04:15 PM |
Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 12:17 PM |
regex help please | thevoiceofcheese | Calibre | 2 | 08-01-2011 11:27 PM |
Regex | Faster | Sigil | 2 | 04-24-2011 09:08 PM |