Thread: Regex examples
View Single Post
Old 03-04-2014, 04:14 AM   #305
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
you could try this:

Code:
find:
(?<=\P{Greek})(\p{Greek}+)-(?!\1)

replace:
\1
it's looking one or more greek characters in a capturing group:

(\p{Greek}+)

that are preceded by anything other than a greek character:

(?<=\P{Greek})

then a hyphen:

-

that is not followed by the group it matched previously:

(?!\1)

replacing it with \1 just removes the hyphen

** edit **

i was trying to get this to work with unicode ranges so that it could be simplified further (no need for the look-behind), but couldn't seem to get it working in sigil, or my other text editor which has PCRE, for that matter.

i was trying to match [\u0370-\u03FF] and (?-u)[\u0370-\u03FF] with no success. anyone have tips on this?

** edit 2 **

i was hoping to get rid of the look-behind by starting the expression with a word boundary, but turns out \b is only useful for ASCII characters, i.e. [a-zA-Z0-9_], so looks like the look-behind may be necessary in these cases.

here's an updated version based on Doitsu's comment below that includes Greek_Extended in the search pattern:

Code:
(?<![\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}])([\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}]+)-(?!\1)

Last edited by mzmm; 03-04-2014 at 07:34 AM.
mzmm is offline   Reply With Quote