Thread: Regex examples
View Single Post
Old 08-10-2014, 09:52 AM   #390
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by Leonatus View Post
... Does your proposal not match any uppercase letter in the respective context?

The point is, that nouns in the german language have always been spelled uppercase
ah, right, German Nouns. think i answered too quickly the first time.

but yes, the regex would match all uppercase words.

there's going to be some issues with a regex that only catches pronouns, for a few reasons i think; one is that the formal Sie/Ihnen should remain uppercase, whereas sie (she) or ihnen (them) should be converted to lower case.

also, if one is referring to God, i'm uncertain as to weather that would constitute an uppercase Du, or lowercase du, so you may have to be aware of the context there.

anyway, i'd maybe suggest trying something like this:
Code:
(?<![.!?])(\s«?)(Ich|Mich|Mir|Du|Dich|Dir|Er|Ihn|Ihm|Ihr|Es|Wir|Uns|Euch)\b
and then replacing with
Code:
\1\L\2
the first capturing group (\s«?) is looking for a space that may be followed by a «.

unfortunately, you'd then need to go through the text searching for
Code:
(?<![.!?])(\s«?)(Sie|Ihnen)\b
and replacing with
Code:
\1\L\2
or just skipping over it based on the context of the sentence (formal Sie or female sie)

also this wouldn't take into account reflexive or possessive pronouns, i.e. meines, deines, seines, ihres, seines etc, but you didn't mention that these were also uppercased.

in case they are, then you'd want to add them into the second capturing group separated by a pipe | with the other words. the regex is going to get increasingly complex and brittle if you do need to include all relative, demonstrative, interrogative, etc pronouns, and may in the end not be possible to use reliably.

so, maybe that helps?

here's a link to an online editor in case you want to try some more stuff out

http://regex101.com/r/lI3yN2/2

Last edited by mzmm; 08-10-2014 at 10:12 AM.
mzmm is offline   Reply With Quote