| 
			
			 | 
		#1 | 
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 146 
				Karma: 194 
				Join Date: Jun 2010 
				Location: Melbourne 
				
				
				Device: iPad 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Regex to search at beginning of line
			 
			
			
			HI all. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I search for this type of code: Code: 
	<p class="list1"><span class="list1">2.</span>Mathematics disorder (315.1)</p> Code: 
	<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> Code: 
	<div class="\1"><p class="\1"><span class="\1">\3</span></p><p class="\1">\4</p></div> I tried using the "^" in front of my search string, but this didn't stop it from finding the same p tag inside the div tag. Any ideas? Thanks in advance.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,877 
				Karma: 8821117 
				Join Date: Mar 2013 
				Location: Rosario - Santa Fe - Argentina 
				
				
				Device: Kindle 4 NT 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 In the "Find" field, try using the following chain: Code: 
	<p class="list(.+)"><span class="list(.*)">(.+)</span>(.+)</p> Code: 
	<div class="list\1"><p class="list\1"><span class="list\1">\3</span></p><p class="list\1">\4</p></div> But it makes little sense to use the same class for the <p> and <span> tags because both of them will apply the same format.   In that case, the <span> code is not necessary (is redundant).  You could use in the "Replace" field the following chain:Code: 
	<div class="list\1"><p class="list\1">\3</p><p class="list\1">\4</p></div> Rubén  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 146 
				Karma: 194 
				Join Date: Jun 2010 
				Location: Melbourne 
				
				
				Device: iPad 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks for the info. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I understand what you are telling me to do but I am trying to understand the Regex syntax. Why doesn't the ^ tell the find to begin at the beginning of the line?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | |
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Code: 
	(?m)^<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> Code: 
	(?<!\>)<p class="(list.+)"><span class="(list.*)">(.+)</span>(.+)</p> I highly recommend using something similar to the second method, since the common practice of indenting html code--for readability--could easily mess with your idea of what might constitute "the beginning" of a line. Last edited by DiapDealer; 05-18-2013 at 12:48 PM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 146 
				Karma: 194 
				Join Date: Jun 2010 
				Location: Melbourne 
				
				
				Device: iPad 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			THANK you so much! 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'll give this a try tomorrow. It is now 1.30 am so I have to sleep! Thanks again for the details. I really appreciate your time in helping me. Keep safe,  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | |
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Try to use the least 'greedy' pattern match, rather than trap with a big net. Code: 
	(list\d+)
 | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			An excellent point.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 698 
				Karma: 150000 
				Join Date: Feb 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			It could also be that "pretty print" is adding whitespace to the beginning of the line (e.g. tab?). 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I did a quick experiment, and while Code: 
	find: ^<p class= Code: 
	find: ^\W*<p class=  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 146 
				Karma: 194 
				Join Date: Jun 2010 
				Location: Melbourne 
				
				
				Device: iPad 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thank you all for the information. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I did the (?m)^ and it worked. It didn't find the p tag inside the div tag. When I did remove the div tag from the front and the end of the para, I searched again but it did not find the p tag in its original format. I took note of what st_albert said and Voila! it worked. I assume the \W means white space? Is there a nice succinct cheat sheet somewhere I can download that has all the codes used by Sigil. I have seen this page: http://docs.python.org/2/library/re.html but it is a bit daunting for a newbie! Thanks again to all of you for your help!  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,891 
				Karma: 207182180 
				Join Date: Jan 2010 
				
				
				
				Device: Nexus 7, Kindle Fire HD 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			The whitespace preceding indented lines of html is exactly why I suggested using a negative lookbehind assertion instead of the ^ metacharacter.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	
	
		
		
		
		
		  'cheatography regex' look for the download PDF button The site has other useful cheat sheets (css2, html)  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 146 
				Karma: 194 
				Join Date: Jun 2010 
				Location: Melbourne 
				
				
				Device: iPad 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks so much! 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I remember seeing this some time back but never kept a bookmark for it. Thank you!  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,088 
				Karma: 11502975 
				Join Date: Mar 2013 
				Location: Guben, Brandenburg, Germany 
				
				
				Device: Kobo Clara 2E, Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I had put this question into an old thread, so nobody read it. Please don't mind me repeating it here: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	"I have some ebooks where, in the current of the text, appear page numbers (probably referring to the original printed version), sometimes even with hyperlink referring to the original TOC. I would like to delete them, but have no clue on regex matters. In one particular book, the numbers appear in squared brackets, such as [Pg 4]. Those numbers have up to three digits. The tags are like this: <span class="pagenum"><a class="pcalibre pcalibre1" id="Page_4">[Pg 4]</a></span>. Is there a way of removing them by one single regex command in Sigil or Calibre?" By the way: It would be very nice, if there were a feature in Sigil to do such operations without necessity of beeing a Regex crack (I tried a lot of things, but nothing that worked only approximately). No idea if there are technical possibilities, but I think many users would be happy to be able to manipulate theír books in an easier way.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 657 
				Karma: 64171 
				Join Date: Sep 2010 
				Location: Kent, England, Sol 3, ZZ9 plural Z Alpha 
				
				
				Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			If each of the pagenums have the <span class="pagenum"> and have no further embedded spans, then a simple regex would be 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	<span class="pagenum".+/span> you may need to insert a questionmark after the plus sign, to make it less greedy  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,088 
				Karma: 11502975 
				Join Date: Mar 2013 
				Location: Guben, Brandenburg, Germany 
				
				
				Device: Kobo Clara 2E, Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Ah, thanks! I always thought that it must me quite simple. I had inserted a d+ after the "pagenum". That lead - I don't know why - to nothing. And I thought that I would have to include - anyhow - the [Pg 4] expression.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	Well, I'll try it.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Regex search and replace | dwlamb | Sigil | 6 | 04-12-2013 03:34 PM | 
| Regex Problem / Line that does't end with .</p> | mcam77 | Sigil | 6 | 03-25-2013 07:38 PM | 
| how do I span more than one line with regex | BartB | Sigil | 3 | 12-11-2011 06:12 PM | 
| Importing RegEx Line | TheEldest | Calibre | 1 | 07-05-2011 11:18 PM | 
| Insert new line with regex | deckoff | Sigil | 6 | 08-08-2010 12:24 PM |