MobileRead Forums - View Single Post

jordy1955 · 06-17-2022, 09:58 PM

Hi,
Firstly let me say that I am a very rudimentary user of regex. Most of it is beyond my comprehension.

I have some eBooks that were clearly produced by less than spectacular OCR software.

Accordingly, the formatting ranges from quite good to really bad.

One of the main problems is line breaks in the wrong places (eg in the middle of a sentence), making the text very difficult to follow.

In F&R I have used this "[a-z]</p><p class="calibre_1">" - or similar - to quite successfully find these instances, but the problem is that the entirety of the matched regex is selected and I cannot for the life of me work out how to get the replace function to disregard the [a-z] component of the result in order to avoid what can be hundreds of manual interventions to fix all the errors.

Any assistance is gratefully accepted.

thanks

Paul

06-17-2022, 09:58 PM	#1
jordy1955 Junior Member Posts: 7 Karma: 10 Join Date: Aug 2021 Device: Kindle	Need help with regex Hi, Firstly let me say that I am a very rudimentary user of regex. Most of it is beyond my comprehension. I have some eBooks that were clearly produced by less than spectacular OCR software. Accordingly, the formatting ranges from quite good to really bad. One of the main problems is line breaks in the wrong places (eg in the middle of a sentence), making the text very difficult to follow. In F&R I have used this "[a-z]</p><p class="calibre_1">" - or similar - to quite successfully find these instances, but the problem is that the entirety of the matched regex is selected and I cannot for the life of me work out how to get the replace function to disregard the [a-z] component of the result in order to avoid what can be hundreds of manual interventions to fix all the errors. Any assistance is gratefully accepted. thanks Paul Last edited by jordy1955; 06-17-2022 at 10:02 PM.