So, what is a word (for your purposes)? I think it's something like:
Code:
[A-Z0-9][A-Z0-9\.,…’“”!?—-]*
A letter/digit followed by any number (as many as possible, possibly zero) of letter/digit/punctuation. You may want to include “ and ‘ in the "initial" class, and maybe & along with all the letters.
Now you want a number of words separated by spaces, how about (untested):
i.e.
Code:
([A-Z0-9][A-Z0-9\.,…’“”!?—-]*\s+)\b