Quote:
Originally Posted by lomkiri
If I understand correctly, you want to transform 3 dots (with or without spaces) in ellipsis, always remove a space before, and always remove a space after except if it's a letter.
You can do this with a regex-function :
Code:
— search :
\s?(?:(?:\. ?\. ?\. ?)|…) ?(.)?
— replace (in mode regex-function):
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
end = match[1] or '' # be sure is not None
space = ' ' if end.isalpha() else ''
return '…' + space + end
|
It's been on my agenda to try to standardize the various ellipses I run across in books. Your Search, above, pretty much solves that for me. But, for my use, I find that I don't need the Regex-Function. With a minor modification to your search, a plain old Replace works for what I need:
Code:
Search: \s?(?:(?:\.\s?\.\s?\.\s?)|…)(\s?.)? (<-- EDIT: I've replaced the spaces with \s because sometimes they use non-breaking-spaces and plain space didn't pick it up)
Replace (as Regex): …\1
For the Search, all I've done is moved the opening parenthesis of your 2nd group (well, the only "selecting" group) to include the space if present. For the Replace, I just replace whatever form of spaces and ellipses the book already has, with just an actual ellipsis and tack on whatever the book currently has following it.
The problem with ellipses is that there doesn't seem to be any hard-and-fast rules for them. The best I've found (and there are a lot of contradictions) are:
- if it's the old-fashioned 3-dot ellipse with spaces between the dots, then there should be spaces before and after the ellipse.
- If the 3-dots are without intervening spaces, then there's shouldn't be spaces before or after.
- sentences ending with elllipses and a period or comma should have the period/comma first and then the ellipsis.
- sentences ending with question or exclamation marks should have the ellipsis first and then the question/exclamation mark.
Personally, I think rule 3, above, is for the birds. In general, trailing ellipses seem to be for thoughts trailing off. Not for sentences trailing off (which is what the rule implies to me). And, for rules 1 and 2, we're replacing those 3 dots with an actual ellipsis, so they don't really apply.
Soft rules I've found for ellipses seem to say there should always be spaces before and after unless one bumps up against a closing quote. In that case, the ellipsis bumps right up to the closing quote.
But, most of what I see in actual books have opening ellipses right up against the start of sentences, closing ellipses right up against the end of the sentences, and embedded ellipses adjacent to the previous bit of sentence and having a space before the sentence resumes (I think).
And, that's why I changed your search and replace. If I've worked through it correctly, all I've got it doing is stripping off any leading spaces, replacing any 3-dot ellipses with real ellipses and using whatever trailing spaces the author/publisher decided to use.
That seems to do what I need. So, thanks for the regex. It sure made things easier for me.
EDIT: and what is it with Pratchett and ellipses? I just opened a book of his to test this and there are 828 ellipses in the book.