Hey famfam,
you can't catch all possible variants of styling in one regex.
First you should try to get rid of all the inline-styling. Either that or leave out the colon from the search-expression:
Code:
(?<!St|Mr|Mrs|Dr|<|\d)(<p[^>]*>\s*|\!|\?|\.|…)( ?|\s*</?[^b|^>]*>\s*?)(»|«|“|”|„)?( ?|\s*</?[^>]*>\s*?)(?<!<b>)([A-ZÖÄÜ])
Otherwise the regex can break things as your example shows:
Code:
<p style="line-height:<b>1</b>2pt; text-align:justify"><span style="font-family:<b>C</b>alibri, sans-serif">
is broken. It should be:
Code:
<p style="line-height:12pt; text-align:justify"><span style="font-family:Calibri, sans-serif">
For catching the first letter of a sentence, following the footnote/endnote, given in your example, you need an extra Regex.
Try:
Search for:
Code:
(<p[^>]*>\s*|\!|\?|\.|…)(»|«|“|”|„)?((?:\s*</?[^>]*>\s*)*)(\[?\d+\]?)((?:</?[^>]*>)*)(\*)?((?:\s*</?[^>]*>\s*)*)(?<!<b>)([A-ZÖÄÜ])
Replace with:
Code:
\1\2\3\4\5\6\7<b>\8</b>
But in the next book the styling could be completely different ... and then the Regex has to be adjusted.
Good luck.
Klecks.