|  06-17-2022, 08:58 PM | #1 | 
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | 
				
				Need help with regex
			 
			
			Hi, Firstly let me say that I am a very rudimentary user of regex. Most of it is beyond my comprehension. I have some eBooks that were clearly produced by less than spectacular OCR software. Accordingly, the formatting ranges from quite good to really bad. One of the main problems is line breaks in the wrong places (eg in the middle of a sentence), making the text very difficult to follow. In F&R I have used this "[a-z]</p><p class="calibre_1">" - or similar - to quite successfully find these instances, but the problem is that the entirety of the matched regex is selected and I cannot for the life of me work out how to get the replace function to disregard the [a-z] component of the result in order to avoid what can be hundreds of manual interventions to fix all the errors. Any assistance is gratefully accepted. thanks Paul Last edited by jordy1955; 06-17-2022 at 09:02 PM. | 
|   |   | 
|  06-17-2022, 09:25 PM | #2 | 
| Fanatic            Posts: 531 Karma: 2268308 Join Date: Nov 2015 Device: none | 
			
			Use (?<=\p{Ll})</p>\s*<p class="..."> | 
|   |   | 
| Advert | |
|  | 
|  06-17-2022, 09:54 PM | #3 | 
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | |
|   |   | 
|  06-17-2022, 10:08 PM | #4 | 
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | |
|   |   | 
|  06-17-2022, 10:12 PM | #5 | 
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | 
			
			That doesn't work for me and I can't work put what the look behind is supposed to do.  I use: Code: ([\w,—])</p>\s*<p\s*[^>]*?>([\w]) Code: \1 \2 | 
|   |   | 
| Advert | |
|  | 
|  06-17-2022, 10:18 PM | #6 | 
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | 
			
			This is what my query returns. I need to exclude the Single char - in this case the "E" - either in the search result or exclude it in the replace function. Last edited by jordy1955; 06-17-2022 at 10:20 PM. Reason: typo | 
|   |   | 
|  06-17-2022, 10:22 PM | #7 | |
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | Quote: 
 This works, BUT, it also returns the 1st char of the following word - see image How then do I exclude the unwanted chars in the replace field? i've got no idea what the \1 \2 means Last edited by jordy1955; 06-17-2022 at 10:24 PM. | |
|   |   | 
|  06-17-2022, 10:33 PM | #8 | |
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | Quote: 
 thankyou so much. You have saved me hours of manual intervention and frustration | |
|   |   | 
|  06-17-2022, 11:03 PM | #9 | |
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | Quote: 
 Another I have used recently was: Code: ([[:lower:]])\s*</p>\s*<p>\s*([[:lower:]]) And this one doesn't cater for the class. If I am doing this amount of fixing, I remove the class for the normal paragraph. If there are any left, it probably means there is other formatting that I probably don't want to lose. | |
|   |   | 
|  06-18-2022, 12:00 AM | #10 | |
| Guru            Posts: 793 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Sage | Quote: 
 Code: ([\w,—])<\/p>\s*<p\s*[^>]*?>([\w]) | |
|   |   | 
|  06-18-2022, 12:17 AM | #11 | 
| Guru            Posts: 793 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Sage | 
			
			@jordy1955:  I've been using https://regex101.com/ to try various regex things and see what they do. It's been a lot of help. One thing to note, though, the replacement character they use there is a $ instead of the \ used in Calibre's editor. So, if you wanted to test davidfor's replacement string of: Code: \1 \2 Code: $1 $2 | 
|   |   | 
|  06-18-2022, 01:42 AM | #12 | ||
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | Quote: 
 Quote: 
 | ||
|   |   | 
|  06-18-2022, 01:59 AM | #13 | 
| Junior Member  Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle | 
			
			Awesome stuff guys. Just ran it on a book and - once I got my head around it properly - I completed the editing and re-formatting in about 1hr - about 4 hours less than it usually takes me. I'll get much quicker with practice but this is great. Again, thanks SO MUCH. Paul | 
|   |   | 
|  06-18-2022, 02:20 AM | #14 | 
| Wizard            Posts: 2,215 Karma: 8888888 Join Date: Jun 2010 Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite | |
|   |   | 
|  06-18-2022, 09:37 AM | #15 | |
| Guru            Posts: 793 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Sage | Quote: 
 This has been a productive thread for me: I found a much better search/replace for fixing badly split paragraphs, I learned that I could change the behavior of the regex101 site to match Calibre's editor, and some of the search strings I use will be easier now that I won't have to escape the / character. Thanks. | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| pdf regex question - regex that wraps to a new line | flyash | Conversion | 1 | 09-05-2021 09:00 AM | 
| Predefined regex for Regex-function | sherman | Editor | 3 | 01-19-2020 05:32 AM | 
| Regex help please | FrostWolf | Library Management | 2 | 09-23-2014 11:50 PM | 
| RegEx Help | ghostyjack | Workshop | 4 | 03-22-2012 09:24 AM | 
| Regex | Gunnerp245 | Conversion | 5 | 03-05-2012 04:15 PM |