Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 08-08-2014, 01:33 PM   #376
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by mzmm View Post
is that working for you in sigil? my PCRE editor doesn't match

i'd probably use something like

find
Code:
(?<![.!?])( [A-Z])(?=[a-z])
replace
Code:
\U\1\E


but yes, some examples of exactly want to match would help, it's a little unclear
Yes, I forgot some things, like the part where lookbehinds need to look behind.

Find:
Code:
(?<![.!?])(?<=[ ])([A-Z])(?=[a-z]+)
Replace. Keep in mind that \E -- end of modifier's action -- is not strictly necessary if the entire replacement is being flagged as lowercase:
Code:
\L\1
This time I actually checked in Sigil.

You lost a space. Also, you imitated my mistake of offering an uppercasing solution (for letters that are already uppercase ) instead of lowercasing. I blame my dental surgery, what's your excuse? (You can blame it on me, I did trick you. )

Last edited by eschwartz; 08-08-2014 at 01:43 PM.
eschwartz is offline   Reply With Quote
Old 08-08-2014, 01:55 PM   #377
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,551
Karma: 44291176
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Might want to replace those old-fashioned [A-Z][a-z] classes with something more unicode-friendly (such as \p{L} and its uppercase/lowercase variants). We're not regexing in an ascii-only world anymore. Even in English texts.
DiapDealer is offline   Reply With Quote
 
Advertisement
Old 08-08-2014, 02:05 PM   #378
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by DiapDealer View Post
Might want to replace those old-fashioned [A-Z][a-z] classes with something more unicode-friendly (such as \p{L} and its uppercase/lowercase variants). We're not regexing in an ascii-only world anymore. Even in English texts.
I am an american heathen who doesn't know anything about this "foreign language" stuff. ascii is all that matters.

I could do \w if you want.
eschwartz is offline   Reply With Quote
Old 08-08-2014, 02:28 PM   #379
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,551
Karma: 44291176
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by eschwartz View Post
I could do \w if you want.
Which would only include non-ascii characters if regex's Unicode switch is turned on (and even then, it's going to include digits and underscores as well. No... if you're looking to match only letters--but even those letters used to spell naive and facade correctly--it's \p{L} you're gonna want.

Last edited by DiapDealer; 08-08-2014 at 02:36 PM.
DiapDealer is offline   Reply With Quote
Old 08-08-2014, 02:37 PM   #380
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by DiapDealer View Post
Which would only include non-ascii characters if regex's Unicode switch is turned on (and even then, it's going to include digits and underscores as well. No... if you're looking to match only letters--but even those letters used to spell naive and facade correctly--it's \p{L} you're gonna want.
See above, under american heathen.

I will keep it in mind for next time.
eschwartz is offline   Reply With Quote
Old 08-08-2014, 02:44 PM   #381
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,551
Karma: 44291176
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Hey, I'm American too! I just happened to notice the phrase "older German grammar" in the original request. Don't want to give advice that might cause them to miss the very stuff they were looking for do we?
DiapDealer is offline   Reply With Quote
Old 08-08-2014, 02:57 PM   #382
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by DiapDealer View Post
Hey, I'm American too! I just happened to notice the phrase "older German grammar" in the original request. Don't want to give advice that might cause them to miss the very stuff they were looking for do we?
And I learned something new myself, today.

Fortunately, the easiest part should be replacing the characters to search for with a bigger set. My expertise accurately targeted what I know about, which is the framework behind the search. On which note, we really need calibre editor macros already, since calibre doesn't seem to support the full PCRE.
eschwartz is offline   Reply With Quote
Old 08-08-2014, 04:18 PM   #383
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,551
Karma: 44291176
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by eschwartz View Post
On which note, we really need calibre editor macros already, since calibre doesn't seem to support the full PCRE.
It supports a whole big bunch of it. Anything missing is mainly on the replacement side of things--like the (upper|lower)case thing (which--lets face it--is pretty specialized/gimmicky to begin with). The calibre editor's regex engine also gains us a few things that other regex flavors don't have, like variable length lookbehinds and the short-hand classes \m \M (which match the beginning and end of words respectively), as opposed to just the \b (word boundary).

The only thing I REALLY miss in calibre's regex ATM is the \K functionality. I'm not entirely sure why it's unavailable--since it's certainly part of the Barnett Python regex module that it's employing for its editor's S&R.
DiapDealer is offline   Reply With Quote
Old 08-08-2014, 04:30 PM   #384
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Never seen \K before.

Seems like it is useful mainly as a replacement for lookbehind assertions (while still capturing stuff! )
eschwartz is offline   Reply With Quote
Old 08-08-2014, 05:33 PM   #385
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,551
Karma: 44291176
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by eschwartz View Post
Never seen \K before.

Seems like it is useful mainly as a replacement for lookbehind assertions (while still capturing stuff! )
Actually, I could be wrong. \K may not be a part of the regex module being used in calibre. I will miss it very much if so.

But then again... with variable-length lookbehind assertions allowed, it may not be all that hard to replicate \K's functionality!

I've just always hated remembering the lookbehind syntax:
When using the string 'hhhhhhhhhhhhhhhhhhhhhhd':
It was always easier to search for h+\Kd in Sigil (provided finding a 'd' that follows a potentially unknown number of 'h's was as vitally important to you as it is to me! ). Beside the fact that (?<=h+)d wouldn't fly in Sigil, the (?<=) and (?<!) hokum of lookbehinds was (and still is) always difficult for me to remember on the fly. I find it terribly unintuitive.

But now that (?<=h+)d WILL work in calibre's editor ... the \K isn't AS vitally important to me--provided I get over the mental stumbling block of remembering the (positive|negative) look(ahead|behind)'s syntax.

So with the exception of the case alteration on replacements, what are you finding you can't do in calibre's regex S&R that you could in Sigil's?

Last edited by DiapDealer; 08-08-2014 at 05:42 PM.
DiapDealer is offline   Reply With Quote
Old 08-08-2014, 05:36 PM   #386
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I hear ya!

I just forgot to make lookbehinds look behind today.

New tool for the Sigil toolkit, at least.
eschwartz is offline   Reply With Quote
Old 08-10-2014, 04:13 AM   #387
Leonatus
Addict
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 239
Karma: 263322
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
@eschwartz,
@mzmm:

as to the quotes: In a direct speech, for example:

"Du, Du willst doch nur ...",

the first upper case 'Du' should be maintained, whereas the second should be lower case.

But if the sentence is:

"Blah, blah, blah", sagte Er, "Du, Du willst doch nur...",

all the personal pronouns ('Er', 'Du') should be lower case, for the direct speech is only continued; in the first example it is starting the sentence.

BTW: Instead of " ", I use right and left angled quotation marks (guillemts).

Many thanks so far, and I hope it has become clearer!
Leonatus is offline   Reply With Quote
Old 08-10-2014, 04:29 AM   #388
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 8,449
Karma: 15616579
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
My previous code assumed a word was preceded by a space, but I stuck in a check for NOT opening guillemet. Also, using DiapDealer's unicode codepoints.

Find:
Code:
(?<![.!?«])(?<=[ ])(\p{Lu})(?=\p{Ll}+)
Replace:
Code:
\L\1
Finds a space, capital letter, lowercase letter, assuming it is not proceeded by punctuation types:
.!?«

Does that work? If not, I'll try to come up with something more inspired in the morning.

Last edited by eschwartz; 08-10-2014 at 04:34 AM.
eschwartz is offline   Reply With Quote
Old 08-10-2014, 05:52 AM   #389
Leonatus
Addict
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 239
Karma: 263322
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
For weekend reasons, I have the text to be treated not available here to test it, but there might some additional clarification be necessary.

Does your proposal not match any uppercase letter in the respective context?

The point is, that nouns in the german language have always been spelled uppercase (at the beginning of the word, of course), also today, and should remain. Whereas, in the former spelling, most of words representing objects or persons, such as pronouns, have been written uppercase, having to be written lowercase following the actual spelling grammar. So, in English it would be like this:

The black Panther was meant to attack Him immediately, but He jumped quickly aside beyond the Wall.

Thus, the "Panther" and the "Wall" should remain uppercase, but "He" and "Him" should turn lowercase.

I hope the problem I have became clearer.
Leonatus is offline   Reply With Quote
Old 08-10-2014, 10:52 AM   #390
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 163
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by Leonatus View Post
... Does your proposal not match any uppercase letter in the respective context?

The point is, that nouns in the german language have always been spelled uppercase
ah, right, German Nouns. think i answered too quickly the first time.

but yes, the regex would match all uppercase words.

there's going to be some issues with a regex that only catches pronouns, for a few reasons i think; one is that the formal Sie/Ihnen should remain uppercase, whereas sie (she) or ihnen (them) should be converted to lower case.

also, if one is referring to God, i'm uncertain as to weather that would constitute an uppercase Du, or lowercase du, so you may have to be aware of the context there.

anyway, i'd maybe suggest trying something like this:
Code:
(?<![.!?])(\s«?)(Ich|Mich|Mir|Du|Dich|Dir|Er|Ihn|Ihm|Ihr|Es|Wir|Uns|Euch)\b
and then replacing with
Code:
\1\L\2
the first capturing group (\s«?) is looking for a space that may be followed by a «.

unfortunately, you'd then need to go through the text searching for
Code:
(?<![.!?])(\s«?)(Sie|Ihnen)\b
and replacing with
Code:
\1\L\2
or just skipping over it based on the context of the sentence (formal Sie or female sie)

also this wouldn't take into account reflexive or possessive pronouns, i.e. meines, deines, seines, ihres, seines etc, but you didn't mention that these were also uppercased.

in case they are, then you'd want to add them into the second capturing group separated by a pipe | with the other words. the regex is going to get increasingly complex and brittle if you do need to include all relative, demonstrative, interrogative, etc pronouns, and may in the end not be possible to use reliably.

so, maybe that helps?

here's a link to an online editor in case you want to try some more stuff out

http://regex101.com/r/lI3yN2/2

Last edited by mzmm; 08-10-2014 at 11:12 AM.
mzmm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 07:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 04:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 09:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 04:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 05:23 AM


All times are GMT -4. The time now is 08:12 AM.


MobileRead.com is a privately owned, operated and funded community.