Quote:
Originally Posted by Leonatus
... Does your proposal not match any uppercase letter in the respective context?
The point is, that nouns in the german language have always been spelled uppercase
|
ah, right, German Nouns. think i answered too quickly the first time.
but yes, the regex would match all uppercase words.
there's going to be some issues with a regex that only catches pronouns, for a few reasons i think; one is that the formal Sie/Ihnen should remain uppercase, whereas sie (she) or ihnen (them) should be converted to lower case.
also, if one is referring to God, i'm uncertain as to weather that would constitute an uppercase Du, or lowercase du, so you may have to be aware of the context there.
anyway, i'd maybe suggest trying something like this:
Code:
(?<![.!?])(\s«?)(Ich|Mich|Mir|Du|Dich|Dir|Er|Ihn|Ihm|Ihr|Es|Wir|Uns|Euch)\b
and then replacing with
the first capturing group (\s«?) is looking for a space that may be followed by a «.
unfortunately, you'd then need to go through the text searching for
Code:
(?<![.!?])(\s«?)(Sie|Ihnen)\b
and replacing with
or just skipping over it based on the context of the sentence (formal Sie or female sie)
also this wouldn't take into account reflexive or possessive pronouns, i.e. meines, deines, seines, ihres, seines etc, but you didn't mention that these were also uppercased.
in case they are, then you'd want to add them into the second capturing group separated by a pipe | with the other words. the regex is going to get increasingly complex and brittle if you do need to include all relative, demonstrative, interrogative, etc pronouns, and may in the end not be possible to use reliably.
so, maybe that helps?
here's a link to an online editor in case you want to try some more stuff out
http://regex101.com/r/lI3yN2/2