I still didn't get what you mean, of course I'm not going to mark every end line of paragraph manually.
Consider our language barrier, I'm showing you with some random text.
1. This is some random text from novel pdf, it contains lines ends with lots of things.
And with my first regex to find any end-paragraph lines.
2. After first replace:
3. Get every line else with another tag:
4. Done with That:
5. Remove every \n:
6. After that:
7. Get desired \n back:
8. Result:
9. Get space back
10. Final result:
The rest is to place each paragraph in p tags.
I think it pertty much done what it should? I can't understand why you said it can't handle basic brokens that ends with letters.
As to why I need to come up with every punctuation instead of using {P}, that's because it can match :
1. former part of a pair, namly ( [ {
and
2. non-end things like : , ; -
and such. Which you don't want.
Which is highly possible when a broken line ends, and totally avoidable.