| 
			
			 | 
		#1 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Need help with regex
			 
			
			
			Hi, 
		
	
		
		
		
		
		
		
		
		
		
		
		
			Firstly let me say that I am a very rudimentary user of regex. Most of it is beyond my comprehension. I have some eBooks that were clearly produced by less than spectacular OCR software. Accordingly, the formatting ranges from quite good to really bad. One of the main problems is line breaks in the wrong places (eg in the middle of a sentence), making the text very difficult to follow. In F&R I have used this "[a-z]</p><p class="calibre_1">" - or similar - to quite successfully find these instances, but the problem is that the entirety of the matched regex is selected and I cannot for the life of me work out how to get the replace function to disregard the [a-z] component of the result in order to avoid what can be hundreds of manual interventions to fix all the errors. Any assistance is gratefully accepted. thanks Paul Last edited by jordy1955; 06-17-2022 at 10:02 PM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Fanatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531 
				Karma: 2268308 
				Join Date: Nov 2015 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Use 
		
	
		
		
		
		
		
		
		
		
		
		
	
	(?<=\p{Ll})</p>\s*<p class="...">  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905 
				Karma: 47303824 
				Join Date: Jul 2011 
				Location: Sydney, Australia 
				
				
				Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			That doesn't work for me and I can't work put what the look behind is supposed to do.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	I use: Code: 
	([\w,—])</p>\s*<p\s*[^>]*?>([\w]) Code: 
	\1 \2  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			This is what my query returns. 
		
	
		
		
			I need to exclude the Single char - in this case the "E" - either in the search result or exclude it in the replace function. Last edited by jordy1955; 06-17-2022 at 11:20 PM. Reason: typo  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | |
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 This works, BUT, it also returns the 1st char of the following word - see image How then do I exclude the unwanted chars in the replace field? i've got no idea what the \1 \2 means Last edited by jordy1955; 06-17-2022 at 11:24 PM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | |
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 thankyou so much. You have saved me hours of manual intervention and frustration  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | |
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905 
				Karma: 47303824 
				Join Date: Jul 2011 
				Location: Sydney, Australia 
				
				
				Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Another I have used recently was: Code: 
	([[:lower:]])\s*</p>\s*<p>\s*([[:lower:]]) And this one doesn't cater for the class. If I am doing this amount of fixing, I remove the class for the normal paragraph. If there are any left, it probably means there is other formatting that I probably don't want to lose.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | |
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 793 
				Karma: 1538394 
				Join Date: Sep 2013 
				
				
				
				Device: Kobo Sage 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Code: 
	([\w,—])<\/p>\s*<p\s*[^>]*?>([\w])  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 793 
				Karma: 1538394 
				Join Date: Sep 2013 
				
				
				
				Device: Kobo Sage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			@jordy1955:  I've been using 
		
	
		
		
		
		
		
		
		
		
		
		
	
	https://regex101.com/ to try various regex things and see what they do. It's been a lot of help. One thing to note, though, the replacement character they use there is a $ instead of the \ used in Calibre's editor. So, if you wanted to test davidfor's replacement string of: Code: 
	\1 \2 Code: 
	$1 $2  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | ||
| 
			
			
			
			 Grand Sorcerer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905 
				Karma: 47303824 
				Join Date: Jul 2011 
				Location: Sydney, Australia 
				
				
				Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Quote: 
	
  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 7 
				Karma: 10 
				Join Date: Aug 2021 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Awesome stuff guys. Just ran it on a book and - once I got my head around it properly - I completed the editing and re-formatting in about 1hr - about 4 hours less than it usually takes me. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I'll get much quicker with practice but this is great. Again, thanks SO MUCH. Paul  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,215 
				Karma: 8888888 
				Join Date: Jun 2010 
				
				
				
				Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | |
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 793 
				Karma: 1538394 
				Join Date: Sep 2013 
				
				
				
				Device: Kobo Sage 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 This has been a productive thread for me: I found a much better search/replace for fixing badly split paragraphs, I learned that I could change the behavior of the regex101 site to match Calibre's editor, and some of the search strings I use will be easier now that I won't have to escape the / character. Thanks.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| pdf regex question - regex that wraps to a new line | flyash | Conversion | 1 | 09-05-2021 10:00 AM | 
| Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 06:32 AM | 
| Regex help please | FrostWolf | Library Management | 2 | 09-24-2014 12:50 AM | 
| RegEx Help | ghostyjack | Workshop | 4 | 03-22-2012 10:24 AM | 
| Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 05:15 PM |