Thread: Regex examples
View Single Post
Old 05-07-2019, 01:27 PM   #581
Klecks
Enthusiast
Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.
 
Klecks's Avatar
 
Posts: 39
Karma: 59154
Join Date: May 2010
Location: Stuttgart, Germany
Device: Kobo H2O, PocketBook Touch HD, Tolino Vision 4
Hey famfam,

you can't catch all possible variants of styling in one regex.

First you should try to get rid of all the inline-styling. Either that or leave out the colon from the search-expression:
Code:
(?<!St|Mr|Mrs|Dr|<|\d)(<p[^>]*>\s*|\!|\?|\.|…)( ?|\s*</?[^b|^>]*>\s*?)(»|«|“|”|„)?( ?|\s*</?[^>]*>\s*?)(?<!<b>)([A-ZÖÄÜ])
Otherwise the regex can break things as your example shows:

Code:
<p style="line-height:&lt;b&gt;1&lt;/b&gt;2pt; text-align:justify"><span style="font-family:&lt;b&gt;C&lt;/b&gt;alibri, sans-serif">
is broken. It should be:
Code:
<p style="line-height:12pt; text-align:justify"><span style="font-family:Calibri, sans-serif">
For catching the first letter of a sentence, following the footnote/endnote, given in your example, you need an extra Regex.
Try:
Search for:
Code:
(<p[^>]*>\s*|\!|\?|\.|…)(»|«|“|”|„)?((?:\s*</?[^>]*>\s*)*)(\[?\d+\]?)((?:</?[^>]*>)*)(\*)?((?:\s*</?[^>]*>\s*)*)(?<!<b>)([A-ZÖÄÜ])
Replace with:
Code:
\1\2\3\4\5\6\7<b>\8</b>
But in the next book the styling could be completely different ... and then the Regex has to be adjusted.



Good luck.
Klecks.
Klecks is offline   Reply With Quote