Quote:
Originally Posted by Jellby
Or a quote mark (in any flavour: ', ", », ’, ”; including German versions: “, ‘, «), or a closing parenthesis (or bracket), or a dash, or a colon, or a question mark, or an ellipsis (…, as a single character), or the HTML entity for any of them (which ends in a semicolon)...
|
well OK, but
I mentioned quote ( mark) -
I am not fussed about foreign version - I only read English text.....
question mark is a fair cop....
when I was taught English grammar at school, then colon and dash were not valid sentence endings -
I would not necessarily expect a line feed to follow either of those - the 3 dot ellipses will be caught by the test for a single full stop -
I am not convinced that a line feed should always follow an ellipses ( which is good as that would be a hard case to add to the regex ! )
I guess there will never be "one regex to rule them all" but I am happy if I can automate 99% of repairs
PS. according to this link
http://grammar.about.com/od/punctuat...punctrules.htm
there are only 3 valid ways to end an English sentence.
but of course - not all authors/publishers abide by the rules....