Question: Is there an actual space before the final closing </p>? And can it actually be relied upon?
In my experience, I wouldn't trust this with a ten foot pole, and would have to check each one on a case-by-case basis. I definitely wouldn't completely rely on a Replace All.
Regex Solutions
I would handle this specific cleanup in a few passes.
First, make sure that you SAVE A COPY before you do anything. Then make sure you don't press Replace All unless you know exactly what you are doing (and have tested a few to make sure the Regex is working properly). Even then, make sure you do a code comparison of the Before/After to make sure you didn't delete key parts of the text.
Before Examples
I would just do a simple Search and Replace to strip out all:
<p class="calibre2"></p>
and
<p class="calibre2"/>
Example #1-3
If you run the above Search/Replaces, then example #1-3 can be condensed into this:
Search: [0-9]+</p>
\s+<p class="calibre2">
Replace: *BLANK OR A SPACE*
Note: In these examples,
Red denotes the Regex that matches the page numbers.
Note: In English, the
Red portion says "look for 1 or more numbers in a row".
The
Blue portion says "look for 1 or more whitespace characters".
Note: There can be legitimate usages of numbers (for example, years/dates/ages). Be careful.
Example #4
Search: [IXVL]+</p>\s+<p class="calibre2">
Replace: *BLANK OR A SPACE*
Note: In English,
Red says "look for the 1 or more 'I' + 'X' + 'L' + 'V' in a row". This should match roman numerals like "IX", "XIII", "XXIV".
Note: "I" is used very often in English, so be careful.
Note: Make sure you have the "Case-sensitive" button turned on.
Example #5
Search: \[[0-9]+\]
Replace: *BLANK OR A SPACE*
Note: In English,
Red says "look for a left bracket" + "look for 1 or more numbers in a row" + "look for a right bracket".
After Examples
Beyond that point, you stated that hyphens should be removed... I would strongly recommend against this. Each one of these has to be checked on a case-by-case basis. The hyphen may actually be a hard hyphen (for example, in the word "all-purpose" might have been broken across pages).
For checking hyphens at the end of paragraphs, I personally run this regex:
Search: -</p>\s+<p>
Replace: *BLANK*
It shouldn't be too bad manually correcting these. In reality, you only have to check a handful of hyphens that were at the end of pages.
I would highly recommend learning at least the basics of Regex:
http://www.regular-expressions.info/quickstart.html
There is also a huge "Regex examples" thread in the Sigil section of the forums:
https://www.mobileread.com/forums/sho...d.php?t=167971
These examples you posted are relatively easy.
Side Note: Thanks for saving your example images as PNG. Vastly superior compared to people who post screenshots as JPG.