Regex examples - Page 19

crutledge · 07-28-2013, 02:58 PM

Quote:

Originally Posted by Doitsu

I'm sure that there's a more elegant solution, but the following simple regex should work:

Find: ([[:upper:]]{1,})([[:lower:]]+)
Replace: \1\U\2\E

Since this simple regex will find title case strings everywhere, you can't use it with Replace All, though.

To replace two consecutive title case words use the following regex:

Find: ([[:upper:]]{1,})([[:lower:]]+) ([[:upper:]]{1,})([[:lower:]]+)
Replace: \1\U\2\E \3\U\4\E

Thank you sir. I will give it a try.

mzmm · 07-29-2013, 12:04 PM

here's another version using a look behind.

Code:

find:
(?<=[A-Z])([a-z]+)

replace:
<small>\U\1\E</small>

trebor6691 · 08-04-2013, 03:03 PM

One of the things I spend the most time editing is bad paragraph breaks. For instance, Tom continued his paragraph

on another line.

The easiest way so far is to regex Search:



[a-z]

then manually <shift> arrow left, and hit space. I would love to Search for the paragraph starting with a lowercase letter, but leave the letter intact and Replace everything before it with the space so that I can replace all at once.

Any help would be greatly appreciated.

theducks · 08-04-2013, 03:13 PM

Quote:

Originally Posted by trebor6691

One of the things I spend the most time editing is bad paragraph breaks. For instance, Tom continued his paragraph

on another line.

The easiest way so far is to regex Search:



[a-z]

then manually <shift> arrow left, and hit space. I would love to Search for the paragraph starting with a lowercase letter, but leave the letter intact and Replace everything before it with the space so that I can replace all at once.

Any help would be greatly appreciated.

You were almost there

Code:

(?sm)</span></p>\s+<p class="calibre9"><span class="calibre6">([a-z])

Code:

(a space here)\1

the slash 1 puts back the captured letter from above

trebor6691 · 08-04-2013, 03:18 PM

That is awesome. Many more uses for the \1 now. Thanks a bunch.

theducks · 08-04-2013, 04:09 PM

Quote:

Originally Posted by trebor6691

That is awesome. Many more uses for the \1 now. Thanks a bunch.

\1 is only the notation for 1st capture
just like \9 would be the 9th capture (never been past \4 myself

)

It is the search term that is the magic

Get a REGEX Cheatsheet and keep it handy.

Leonatus · 08-14-2013, 01:31 PM

Did I overlook a Regex that forces uppercase after period, exclamation mark, interrogation mark and white space, in the case of initial quotation mark without white space?

Thanks in advance!

Leonatus · 08-15-2013, 03:21 AM

Well, I checked it out by myself and found that [.\!?] [a-z] will find any lowercase after punctuation marks, with or without whitespace.

theducks · 08-15-2013, 11:57 AM

Quote:

Originally Posted by Leonatus

Well, I checked it out by myself and found that [.\!?] [a-z] will find any lowercase after punctuation marks, with or without whitespace.

Code:

[\.\!?] [a-z]

If you are looking for a period, you need to escape it or it becomes a wildcard

finds:with 0 or one space (but not a nbsp)

\s?[\.\!?][a-z](\s)?

DiapDealer · 08-15-2013, 02:17 PM

Consider replacing [a-z] with \p{Ll}

That way, lower case unicode characters can be matched as well. You never know when a random "é" or "á" will bite you in the butt (and not just in the above regex).

[a-zA-Z] becomes \p{L}
[a-z] becomes \p{Ll}
[A-Z] becomes \p{Lu}

Leonatus · 08-16-2013, 05:18 AM

Ah, thanks!
Somewhere I had read that inside square brackets some marks don't need to be escaped - except "!". However, I'm through the text, but I shall try again with the escaped period - maybe there were no matches with period and I didn't notice it.

The major problem had been in the "replace" sector: I had to replace everything manually, because all of my ideas concerning regex were inserted literally (no success

)

@DiapDealer: {Ll}: I don't understand neither the meaning of "L" or of the pipe. Which is their general function?

DiapDealer · 08-16-2013, 08:41 AM

Quote:

Originally Posted by Leonatus

@DiapDealer: {Ll}: I don't understand neither the meaning of "L" or of the pipe. Which is their general function?

That's actually a lowercase "L" rather than a pipe.

\p{L} matches any letter character in any language
\p{Ll} matches any lowercase letter character in any language
\p{Lu} matches any uppercase letter character in any language

Even books in English use accented characters that will be overlooked by [a-z].

NOTE: the L or the Ll or the Lu have no special regex meaning outside of the \p{} construct. They simply represent unicode properties/categories. \p{} matches a single character belonging to the specified category, and \P{} matches a single character NOT belonging to the specified category.
http://www.regular-expressions.info/unicode.html

Leonatus · 08-16-2013, 09:27 AM

Thank you!

I just see that lowercase letters after period have been matched, even without escaping the period.

What I further do not understand is, why my command didn't care about space between punctuation mark and letter, i. e. there was a match with and without whitespace.

BobC · 09-08-2013, 07:19 AM

I am looking to do some mass-renumbering of ID's in a book in order to insert endnote hyperlinks.

What I want to do is transform something like:

"id012345" .... "id012444"
to
"ref_1" .... "ref_100"

Significant here is that the last digit of the transformed number is not the same as that of the original. (It would actually be the footnote number extracted from the main text where it appears as [1] ... [100] )

Has anyone found a way to do this with Sigil's regex ?

BobC

Doitsu · 09-08-2013, 07:26 AM

You can search for the footnote number with \[(\d+)\] and the footnote id with id(\d+) and then combine both.

07-29-2013, 12:04 PM	#272
mzmm Groupie Posts: 171 Karma: 86271 Join Date: Feb 2012 Device: iPad, Kindle Touch, Sony PRS-T1	here's another version using a look behind. Code: find: (?<=[A-Z])([a-z]+) replace: <small>\U\1\E</small> Last edited by mzmm; 07-29-2013 at 03:46 PM.

08-04-2013, 03:03 PM	#273
trebor6691 Junior Member Posts: 8 Karma: 10 Join Date: Aug 2013 Device: Kindle	Search, but only replace a portion of the search One of the things I spend the most time editing is bad paragraph breaks. For instance, Tom continued his paragraph on another line. The easiest way so far is to regex Search: </span></p> <p class="calibre9"><span class="calibre6">[a-z] then manually <shift> arrow left, and hit space. I would love to Search for the paragraph starting with a lowercase letter, but leave the letter intact and Replace everything before it with the space so that I can replace all at once. Any help would be greatly appreciated.

08-16-2013, 05:18 AM	#281
Leonatus Wizard Posts: 1,094 Karma: 11562565 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Ah, thanks! Somewhere I had read that inside square brackets some marks don't need to be escaped - except "!". However, I'm through the text, but I shall try again with the escaped period - maybe there were no matches with period and I didn't notice it. The major problem had been in the "replace" sector: I had to replace everything manually, because all of my ideas concerning regex were inserted literally (no success) @DiapDealer: {Ll}: I don't understand neither the meaning of "L" or of the pipe. Which is their general function?

08-16-2013, 09:27 AM	#283
Leonatus Wizard Posts: 1,094 Karma: 11562565 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Thank you! I just see that lowercase letters after period have been matched, even without escaping the period. What I further do not understand is, why my command didn't care about space between punctuation mark and letter, i. e. there was a match with and without whitespace. Last edited by Leonatus; 08-16-2013 at 09:32 AM.

09-08-2013, 07:19 AM	#284
BobC Guru Posts: 691 Karma: 3026110 Join Date: Dec 2008 Location: Lancashire, U.K. Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +	Regex Arithmetic I am looking to do some mass-renumbering of ID's in a book in order to insert endnote hyperlinks. What I want to do is transform something like: "id012345" .... "id012444" to "ref_1" .... "ref_100" Significant here is that the last digit of the transformed number is not the same as that of the original. (It would actually be the footnote number extracted from the main text where it appears as [1] ... [100] ) Has anyone found a way to do this with Sigil's regex ? BobC

08-04-2013, 03:18 PM	#275
trebor6691 Junior Member Posts: 8 Karma: 10 Join Date: Aug 2013 Device: Kindle	That is awesome. Many more uses for the \1 now. Thanks a bunch.

08-14-2013, 01:31 PM	#277
Leonatus Wizard Posts: 1,094 Karma: 11562565 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Did I overlook a Regex that forces uppercase after period, exclamation mark, interrogation mark and white space, in the case of initial quotation mark without white space? Thanks in advance!

08-15-2013, 03:21 AM	#278
Leonatus Wizard Posts: 1,094 Karma: 11562565 Join Date: Mar 2013 Location: Guben, Brandenburg, Germany Device: Kobo Clara 2E, Tolino Shine 3	Well, I checked it out by myself and found that [.\!?] [a-z] will find any lowercase after punctuation marks, with or without whitespace.

08-15-2013, 02:17 PM	#280
DiapDealer Grand Sorcerer Posts: 29,000 Karma: 210162574 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Consider replacing [a-z] with \p{Ll} That way, lower case unicode characters can be matched as well. You never know when a random "é" or "á" will bite you in the butt (and not just in the above regex). [a-zA-Z] becomes \p{L} [a-z] becomes \p{Ll} [A-Z] becomes \p{Lu}

09-08-2013, 07:26 AM	#285
Doitsu Grand Sorcerer Posts: 5,767 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2	You can search for the footnote number with \[(\d+)\] and the footnote id with id(\d+) and then combine both.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 07:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 04:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 09:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 04:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 05:23 AM