08-08-2014, 12:33 PM | #376 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Find: Code:
(?<![.!?])(?<=[ ])([A-Z])(?=[a-z]+) Code:
\L\1 You lost a space. Also, you imitated my mistake of offering an uppercasing solution (for letters that are already uppercase ) instead of lowercasing. I blame my dental surgery, what's your excuse? (You can blame it on me, I did trick you. ) Last edited by eschwartz; 08-08-2014 at 12:43 PM. |
|
08-08-2014, 12:55 PM | #377 |
Grand Sorcerer
Posts: 27,614
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Might want to replace those old-fashioned [A-Z][a-z] classes with something more unicode-friendly (such as \p{L} and its uppercase/lowercase variants). We're not regexing in an ascii-only world anymore. Even in English texts.
|
08-08-2014, 01:05 PM | #378 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
I could do \w if you want. |
|
08-08-2014, 01:28 PM | #379 |
Grand Sorcerer
Posts: 27,614
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Which would only include non-ascii characters if regex's Unicode switch is turned on (and even then, it's going to include digits and underscores as well. No... if you're looking to match only letters--but even those letters used to spell naive and facade correctly--it's \p{L} you're gonna want.
Last edited by DiapDealer; 08-08-2014 at 01:36 PM. |
08-08-2014, 01:37 PM | #380 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
I will keep it in mind for next time. |
|
08-08-2014, 01:44 PM | #381 |
Grand Sorcerer
Posts: 27,614
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Hey, I'm American too! I just happened to notice the phrase "older German grammar" in the original request. Don't want to give advice that might cause them to miss the very stuff they were looking for do we?
|
08-08-2014, 01:57 PM | #382 | |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
Fortunately, the easiest part should be replacing the characters to search for with a bigger set. My expertise accurately targeted what I know about, which is the framework behind the search. On which note, we really need calibre editor macros already, since calibre doesn't seem to support the full PCRE. |
|
08-08-2014, 03:18 PM | #383 | |
Grand Sorcerer
Posts: 27,614
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
The only thing I REALLY miss in calibre's regex ATM is the \K functionality. I'm not entirely sure why it's unavailable--since it's certainly part of the Barnett Python regex module that it's employing for its editor's S&R. |
|
08-08-2014, 03:30 PM | #384 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Never seen \K before.
Seems like it is useful mainly as a replacement for lookbehind assertions (while still capturing stuff! ) |
08-08-2014, 04:33 PM | #385 | |
Grand Sorcerer
Posts: 27,614
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
But then again... with variable-length lookbehind assertions allowed, it may not be all that hard to replicate \K's functionality! I've just always hated remembering the lookbehind syntax: When using the string 'hhhhhhhhhhhhhhhhhhhhhhd': It was always easier to search for h+\Kd in Sigil (provided finding a 'd' that follows a potentially unknown number of 'h's was as vitally important to you as it is to me! ). Beside the fact that (?<=h+)d wouldn't fly in Sigil, the (?<=) and (?<!) hokum of lookbehinds was (and still is) always difficult for me to remember on the fly. I find it terribly unintuitive. But now that (?<=h+)d WILL work in calibre's editor ... the \K isn't AS vitally important to me--provided I get over the mental stumbling block of remembering the (positive|negative) look(ahead|behind)'s syntax. So with the exception of the case alteration on replacements, what are you finding you can't do in calibre's regex S&R that you could in Sigil's? Last edited by DiapDealer; 08-08-2014 at 04:42 PM. |
|
08-08-2014, 04:36 PM | #386 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
I hear ya!
I just forgot to make lookbehinds look behind today. New tool for the Sigil toolkit, at least. |
08-10-2014, 03:13 AM | #387 |
Wizard
Posts: 1,027
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
@eschwartz,
@mzmm: as to the quotes: In a direct speech, for example: "Du, Du willst doch nur ...", the first upper case 'Du' should be maintained, whereas the second should be lower case. But if the sentence is: "Blah, blah, blah", sagte Er, "Du, Du willst doch nur...", all the personal pronouns ('Er', 'Du') should be lower case, for the direct speech is only continued; in the first example it is starting the sentence. BTW: Instead of " ", I use right and left angled quotation marks (guillemts). Many thanks so far, and I hope it has become clearer! |
08-10-2014, 03:29 AM | #388 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
My previous code assumed a word was preceded by a space, but I stuck in a check for NOT opening guillemet. Also, using DiapDealer's unicode codepoints.
Find: Code:
(?<![.!?«])(?<=[ ])(\p{Lu})(?=\p{Ll}+)
Code:
\L\1 .!?« Does that work? If not, I'll try to come up with something more inspired in the morning. Last edited by eschwartz; 08-10-2014 at 03:34 AM. |
08-10-2014, 04:52 AM | #389 |
Wizard
Posts: 1,027
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
For weekend reasons, I have the text to be treated not available here to test it, but there might some additional clarification be necessary.
Does your proposal not match any uppercase letter in the respective context? The point is, that nouns in the german language have always been spelled uppercase (at the beginning of the word, of course), also today, and should remain. Whereas, in the former spelling, most of words representing objects or persons, such as pronouns, have been written uppercase, having to be written lowercase following the actual spelling grammar. So, in English it would be like this: The black Panther was meant to attack Him immediately, but He jumped quickly aside beyond the Wall. Thus, the "Panther" and the "Wall" should remain uppercase, but "He" and "Him" should turn lowercase. I hope the problem I have became clearer. |
08-10-2014, 09:52 AM | #390 | |
Groupie
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
|
Quote:
but yes, the regex would match all uppercase words. there's going to be some issues with a regex that only catches pronouns, for a few reasons i think; one is that the formal Sie/Ihnen should remain uppercase, whereas sie (she) or ihnen (them) should be converted to lower case. also, if one is referring to God, i'm uncertain as to weather that would constitute an uppercase Du, or lowercase du, so you may have to be aware of the context there. anyway, i'd maybe suggest trying something like this: Code:
(?<![.!?])(\s«?)(Ich|Mich|Mir|Du|Dich|Dir|Er|Ihn|Ihm|Ihr|Es|Wir|Uns|Euch)\b Code:
\1\L\2 unfortunately, you'd then need to go through the text searching for Code:
(?<![.!?])(\s«?)(Sie|Ihnen)\b Code:
\1\L\2 also this wouldn't take into account reflexive or possessive pronouns, i.e. meines, deines, seines, ihres, seines etc, but you didn't mention that these were also uppercased. in case they are, then you'd want to add them into the second capturing group separated by a pipe | with the other words. the regex is going to get increasingly complex and brittle if you do need to include all relative, demonstrative, interrogative, etc pronouns, and may in the end not be possible to use reliably. so, maybe that helps? here's a link to an online editor in case you want to try some more stuff out http://regex101.com/r/lI3yN2/2 Last edited by mzmm; 08-10-2014 at 10:12 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Examples of Subgroups | emonti8384 | Lounge | 32 | 02-26-2011 06:00 PM |
Accessories Pen examples | Gunnerp245 | enTourage Archive | 15 | 02-21-2011 03:23 PM |
Stylesheet examples? | Skitzman69 | Sigil | 15 | 09-24-2010 08:24 PM |
Examples | kafkaesque1978 | iRiver Story | 1 | 07-26-2010 03:49 PM |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |