Finding missing Oxford Commas

avantman42 · 07-19-2013, 05:01 AM

In school, I was always taught not to use Oxford Commas, but I've recently decided to use them because I'm writing non-fiction and clarity is important to me. However, it's difficult to unlearn 30 years of not using them, so I've tried to come up with a regular expression to find missing Oxford Commas.

This is the best I've come up with so far:

Code:

[a-zA-Z0-9]+, [a-zA-Z0-9]+ and

(there is a space after the and but it's getting stripped by the forum software).

It's not perfect, but it's the best I've come up with. Can anyone improve on it?

gmw · 07-19-2013, 08:55 AM

If you're using OpenOffice/LibreOffice then you can use the end-word flag. (The effect is probably no different, but it's shorter to type.) Using [:alnum:] is not much shorter but I would hope (but don't actually know) it would not be limited to matching only ASCII alphanumerics.

Code:

\>, [:alnum:]+ and

And additional margin of safety may be had by allowing for extra or missing spaces:

Code:

\> *, *[:alnum:]+ +and

Or a variation without alnum that works in at least the basic instances (not sure if unusual punctuation might upset it):

Code:

\> *, *[^ ,]+ +and

None of these are really any better/smarter than what you already had.

If you're using some other software then check which variation of regex they are using to duplicate the effect of the above.

avantman42 · 07-19-2013, 09:35 AM

I'm currently using it in LibreOffice, but ideally I'd like it to work with Geany and grep, too. I think I'll probably use alnum, but if I'm not getting an improvement, I'll probably not use the other changes. I'm not a RegExp guru by any means, but I understand what I've currently got, and there's some value in that

gmw · 07-19-2013, 11:34 AM

I was thinking a bit more about this, and so far what you have will only work for single word lists: "apples, oranges and pears", but you won't find: "five apples, three oranges and two pears". And I might suggest that it is the more complicated list that is more important one if additional clarity is your goal.

For a more comprehensive match I think you are going to need something like:

Code:

\>,[^,.;"“”]+ and

This will almost certainly find more than you want, probably a lot of false positives, but you are less likely to miss important instances. The quotes are there to stop the search going over quote boundaries - I included plain as well as open/close, you probably can't exclude an apostrophe (because you "can't"

). The full stop is there to stop it at the end of a sentence (obviously), and the semi-colon because it should represent a superior phrase boundary.

You may be able to limit the number of false positives by limiting the amount of text permitted between the comma and the " and":

Code:

\>,[^,."“”]{1,30} and

But of course this risks missing some of the worst offenders.

ETA: I specified the quotes because of what I saw when testing on my fiction writing, in your non-fiction it maybe that you would be better off without the quotes in the exclude list - in case your lists include quoted items.

HarryT · 07-19-2013, 12:01 PM

To be honest I'd just search for the word "and" and do it manually on a case-by-case basis.

gmw · 07-19-2013, 08:35 PM

Quote:

Originally Posted by HarryT

To be honest I'd just search for the word "and" and do it manually on a case-by-case basis.

But, Harry, just think of the fun you would be missing by not spending hours and hours learning the wonderfully esoteric syntax of regular expressions.

avantman42 · 07-20-2013, 03:29 AM

Quote:

Originally Posted by gmw

You may be able to limit the number of false positives by limiting the amount of text permitted between the comma and the " and":

Code:

\>,[^,."“”]{1,30} and

But of course this risks missing some of the worst offenders.

I'll experiment, but it doesn't have to be perfect. My editor is very good at spotting missing Oxford Commas. Plus, of course, I'm gradually getting used to adding them as I write.

07-19-2013, 05:01 AM	#1
avantman42 Wizard Posts: 1,090 Karma: 6058305 Join Date: Sep 2010 Location: UK Device: Kindle Paperwhite	Finding missing Oxford Commas In school, I was always taught not to use Oxford Commas, but I've recently decided to use them because I'm writing non-fiction and clarity is important to me. However, it's difficult to unlearn 30 years of not using them, so I've tried to come up with a regular expression to find missing Oxford Commas. This is the best I've come up with so far: Code: [a-zA-Z0-9]+, [a-zA-Z0-9]+ and (there is a space after the and but it's getting stripped by the forum software). It's not perfect, but it's the best I've come up with. Can anyone improve on it?

07-19-2013, 08:55 AM	#2
gmw cacoethes scribendi Posts: 5,809 Karma: 137770742 Join Date: Nov 2010 Location: Australia Device: Kobo Aura One & H2Ov2, Sony PRS-650	If you're using OpenOffice/LibreOffice then you can use the end-word flag. (The effect is probably no different, but it's shorter to type.) Using [:alnum:] is not much shorter but I would hope (but don't actually know) it would not be limited to matching only ASCII alphanumerics. Code: \>, [:alnum:]+ and And additional margin of safety may be had by allowing for extra or missing spaces: Code: \> , [:alnum:]+ +and Or a variation without alnum that works in at least the basic instances (not sure if unusual punctuation might upset it): Code: \> , [^ ,]+ +and None of these are really any better/smarter than what you already had. If you're using some other software then check which variation of regex they are using to duplicate the effect of the above.

07-19-2013, 11:34 AM	#4
gmw cacoethes scribendi Posts: 5,809 Karma: 137770742 Join Date: Nov 2010 Location: Australia Device: Kobo Aura One & H2Ov2, Sony PRS-650	I was thinking a bit more about this, and so far what you have will only work for single word lists: "apples, oranges and pears", but you won't find: "five apples, three oranges and two pears". And I might suggest that it is the more complicated list that is more important one if additional clarity is your goal. For a more comprehensive match I think you are going to need something like: Code: \>,[^,.;"“”]+ and This will almost certainly find more than you want, probably a lot of false positives, but you are less likely to miss important instances. The quotes are there to stop the search going over quote boundaries - I included plain as well as open/close, you probably can't exclude an apostrophe (because you "can't" ). The full stop is there to stop it at the end of a sentence (obviously), and the semi-colon because it should represent a superior phrase boundary. You may be able to limit the number of false positives by limiting the amount of text permitted between the comma and the " and": Code: \>,[^,."“”]{1,30} and But of course this risks missing some of the worst offenders. ETA: I specified the quotes because of what I saw when testing on my fiction writing, in your non-fiction it maybe that you would be better off without the quotes in the exclude list - in case your lists include quoted items. Last edited by gmw; 07-19-2013 at 11:42 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Missing Commas & Full Stops	Paxman53	Sigil	5	01-09-2013 12:53 PM
semicolons swapped with commas in tags?	Autumn Corvus	Library Management	6	02-26-2011 11:45 AM
Seriously thoughtful What about commas?	GraceKrispy	Lounge	115	10-18-2010 10:19 PM
mobi-meta changes commas to semicolons	GRiker	Calibre	7	04-30-2009 05:38 AM
Commas in LRF metadata	kevin_boone	Calibre	22	02-12-2009 01:39 PM

07-19-2013, 09:35 AM	#3
avantman42 Wizard Posts: 1,090 Karma: 6058305 Join Date: Sep 2010 Location: UK Device: Kindle Paperwhite	I'm currently using it in LibreOffice, but ideally I'd like it to work with Geany and grep, too. I think I'll probably use alnum, but if I'm not getting an improvement, I'll probably not use the other changes. I'm not a RegExp guru by any means, but I understand what I've currently got, and there's some value in that

07-19-2013, 12:01 PM	#5
HarryT eBook Enthusiast Posts: 85,544 Karma: 93383043 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6	To be honest I'd just search for the word "and" and do it manually on a case-by-case basis.