Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > Writers' Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 07-19-2013, 05:01 AM   #1
avantman42
Wizard
avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.
 
avantman42's Avatar
 
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
Finding missing Oxford Commas

In school, I was always taught not to use Oxford Commas, but I've recently decided to use them because I'm writing non-fiction and clarity is important to me. However, it's difficult to unlearn 30 years of not using them, so I've tried to come up with a regular expression to find missing Oxford Commas.

This is the best I've come up with so far:
Code:
[a-zA-Z0-9]+, [a-zA-Z0-9]+ and
(there is a space after the and but it's getting stripped by the forum software).

It's not perfect, but it's the best I've come up with. Can anyone improve on it?
avantman42 is offline   Reply With Quote
Old 07-19-2013, 08:55 AM   #2
gmw
cacoethes scribendi
gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.
 
gmw's Avatar
 
Posts: 5,809
Karma: 137770742
Join Date: Nov 2010
Location: Australia
Device: Kobo Aura One & H2Ov2, Sony PRS-650
If you're using OpenOffice/LibreOffice then you can use the end-word flag. (The effect is probably no different, but it's shorter to type.) Using [:alnum:] is not much shorter but I would hope (but don't actually know) it would not be limited to matching only ASCII alphanumerics.

Code:
\>, [:alnum:]+ and
And additional margin of safety may be had by allowing for extra or missing spaces:
Code:
\> *, *[:alnum:]+ +and
Or a variation without alnum that works in at least the basic instances (not sure if unusual punctuation might upset it):
Code:
\> *, *[^ ,]+ +and
None of these are really any better/smarter than what you already had.

If you're using some other software then check which variation of regex they are using to duplicate the effect of the above.
gmw is offline   Reply With Quote
Old 07-19-2013, 09:35 AM   #3
avantman42
Wizard
avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.
 
avantman42's Avatar
 
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
I'm currently using it in LibreOffice, but ideally I'd like it to work with Geany and grep, too. I think I'll probably use alnum, but if I'm not getting an improvement, I'll probably not use the other changes. I'm not a RegExp guru by any means, but I understand what I've currently got, and there's some value in that
avantman42 is offline   Reply With Quote
Old 07-19-2013, 11:34 AM   #4
gmw
cacoethes scribendi
gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.
 
gmw's Avatar
 
Posts: 5,809
Karma: 137770742
Join Date: Nov 2010
Location: Australia
Device: Kobo Aura One & H2Ov2, Sony PRS-650
I was thinking a bit more about this, and so far what you have will only work for single word lists: "apples, oranges and pears", but you won't find: "five apples, three oranges and two pears". And I might suggest that it is the more complicated list that is more important one if additional clarity is your goal.

For a more comprehensive match I think you are going to need something like:
Code:
\>,[^,.;"“”]+ and
This will almost certainly find more than you want, probably a lot of false positives, but you are less likely to miss important instances. The quotes are there to stop the search going over quote boundaries - I included plain as well as open/close, you probably can't exclude an apostrophe (because you "can't" ). The full stop is there to stop it at the end of a sentence (obviously), and the semi-colon because it should represent a superior phrase boundary.

You may be able to limit the number of false positives by limiting the amount of text permitted between the comma and the " and":
Code:
\>,[^,."“”]{1,30} and
But of course this risks missing some of the worst offenders.

ETA: I specified the quotes because of what I saw when testing on my fiction writing, in your non-fiction it maybe that you would be better off without the quotes in the exclude list - in case your lists include quoted items.

Last edited by gmw; 07-19-2013 at 11:42 AM.
gmw is offline   Reply With Quote
Old 07-19-2013, 12:01 PM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
To be honest I'd just search for the word "and" and do it manually on a case-by-case basis.
HarryT is offline   Reply With Quote
Old 07-19-2013, 08:35 PM   #6
gmw
cacoethes scribendi
gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.gmw ought to be getting tired of karma fortunes by now.
 
gmw's Avatar
 
Posts: 5,809
Karma: 137770742
Join Date: Nov 2010
Location: Australia
Device: Kobo Aura One & H2Ov2, Sony PRS-650
Quote:
Originally Posted by HarryT View Post
To be honest I'd just search for the word "and" and do it manually on a case-by-case basis.
But, Harry, just think of the fun you would be missing by not spending hours and hours learning the wonderfully esoteric syntax of regular expressions.
gmw is offline   Reply With Quote
Old 07-20-2013, 03:29 AM   #7
avantman42
Wizard
avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.
 
avantman42's Avatar
 
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
Quote:
Originally Posted by gmw View Post
You may be able to limit the number of false positives by limiting the amount of text permitted between the comma and the " and":
Code:
\>,[^,."“”]{1,30} and
But of course this risks missing some of the worst offenders.
I'll experiment, but it doesn't have to be perfect. My editor is very good at spotting missing Oxford Commas. Plus, of course, I'm gradually getting used to adding them as I write.
avantman42 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Missing Commas & Full Stops Paxman53 Sigil 5 01-09-2013 12:53 PM
semicolons swapped with commas in tags? Autumn Corvus Library Management 6 02-26-2011 11:45 AM
Seriously thoughtful What about commas? GraceKrispy Lounge 115 10-18-2010 10:19 PM
mobi-meta changes commas to semicolons GRiker Calibre 7 04-30-2009 05:38 AM
Commas in LRF metadata kevin_boone Calibre 22 02-12-2009 01:39 PM


All times are GMT -4. The time now is 09:13 PM.


MobileRead.com is a privately owned, operated and funded community.