| 
			
			 | 
		#1 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 15 
				Karma: 10 
				Join Date: Dec 2012 
				Location: KL, Malaysia 
				
				
				Device: Freda (WP 7.8) EPUB reader app 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Need help for a regex
			 
			
			
			Hello. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'm trying to find and replace elements in HTM documents from a decompiled CHM to make chapter headings in order to create a TOC. The unique identifiers for sub-chapters are as follows: Code: 
	      <div class="TLV1" id="B01306002.0-103" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[2]">
        <div class="HD">
          Taking a history
        </div>
A few non-subchapter elements (box items) are also included if the above expression is used, for example they look like this: Code: 
	        <div class="SIDEBAR BOX">
          <div class="TLV1" id="B01306002.0-167" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[7]/SIDEBAR[2]/TLV1[1]">
            <div class="HD">
              The jugular venous systems
            </div><a id="T5-2"></a>
To add to the complexity, one more unique identifier for sub-chapters exist, to which the original search string I use cannot pick up: Code: 
	      
      <div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]">
        <div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]">
          On being busy: Corrigan's secret door
        </div>
What's the suitable search string that includes both of what I want and ignore elements marked SIDEBAR?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 15 
				Karma: 10 
				Join Date: Dec 2012 
				Location: KL, Malaysia 
				
				
				Device: Freda (WP 7.8) EPUB reader app 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 15 
				Karma: 10 
				Join Date: Dec 2012 
				Location: KL, Malaysia 
				
				
				Device: Freda (WP 7.8) EPUB reader app 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I've somewhat solved the problem for the second part of finding the unique subchapters that have codes like: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	      <div class="TLV1" id="B01306002.0-90" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]">
        <div class="HD" id="H10-1" id_xpath="/CHAPTER[1]/TBD[1]/TLV1[18]/HD[1]">
          On being busy: Corrigan's secret door
        </div>
<div class="TLV1"\s+(.*?)\s+<div class="HD"(\s+(.*?)\s+)(\s+(.*?)\s+)</div> And replace: <div class="TLV1" \1<h2 class="HD"\2\4</h2> But now I pick up SIDEBAR elements as well. So whatever search string that would ignore the word SIDEBAR should work with both.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Evangelist 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 416 
				Karma: 1045911 
				Join Date: Sep 2011 
				Location: Cape Town, South Africa 
				
				
				Device: Kindle 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I did one of these conversions a while back - I'm not sure why you're trying to preserve the sidebar, it's not going to render correctly on most readers anyway. Strip it all out, rather just use a real ToC.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 15 
				Karma: 10 
				Join Date: Dec 2012 
				Location: KL, Malaysia 
				
				
				Device: Freda (WP 7.8) EPUB reader app 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks for the tip. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'm just preserving the code as I'm not too sure what they're for, though it could be because there's instances of JavaScript use particularly for the inline CHM TOC (that didn't work anyway in the CHM for some reason). However for my particular case, I decided to just place all SIDEBAR elements as h3, therefore making them sub-elements to the sub-chapters, and allowing me to use the regex I've found that already work, rather than needing to differentiate them all with an all-encompassing regex.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| RegEx Help | ghostyjack | Workshop | 4 | 03-22-2012 10:24 AM | 
| Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 05:15 PM | 
| Help me with regex please. | eVrajka | Library Management | 5 | 08-15-2011 01:17 PM | 
| regex help please | thevoiceofcheese | Calibre | 2 | 08-02-2011 12:27 AM | 
| Regex | Faster | Sigil | 2 | 04-24-2011 10:08 PM |