Regex examples - Page 10

paulfiera · 09-25-2012, 11:15 AM

Thaks everybody.

It seems that Diapdealer's regex...

Quote:

does not find anchor tags with text or another tags between the opening tag and the closing tag.

The only result found in the same book is:

Quote:

but it still finds anchor tags with href inside.

Jellby · 09-25-2012, 11:38 AM

Yes, it does because it searches for "any character but >" inside the quotes, and that includes the closing quote and the href part.

You probably want something like this:

Code:

<a class="([^"]*?)" id="([^"]*?)"></a>

Anyway, an anchor with href and no text is pretty useless.

paulfiera · 09-25-2012, 11:54 AM

Thanks, Jellby

Quote:

Originally Posted by Jellby

Yes, it does because it searches for "any character but >" inside the quotes, and that includes the closing quote and the href part.

You probably want something like this:

Code:

<a class="([^"]*?)" id="([^"]*?)"></a>

That seems to be working alright

Quote:

Originally Posted by Jellby

Anyway, an anchor with href and no text is pretty useless.

Agree

DiapDealer · 09-25-2012, 02:52 PM

Quote:

Originally Posted by Jellby

<a class="whatever" href="#here">this is a link</a> <a class="other" id="something"></a>

The whole red part would be matched by the first (.*?), right?

Yep. I missed that. Quite obvious when you see it all spelled out in red & black.

This discussion is a perfect example of why I've started avoiding (.*?) if at all possible. It'll always bite you in the ass if it can.

meme · 09-25-2012, 02:57 PM

Quote:

Originally Posted by DiapDealer

Yep. I missed that. Quite obvious when you see it all spelled out in red & black.

This discussion is a perfect example of why I've started avoiding (.*?) if at all possible. It'll always bite you in the ass if it can.

Good thing there is now an option where you can enable or disable its use in the beta

meme · 09-25-2012, 03:00 PM

In case you haven't noticed, there is a new Search Editor in the 0.6.0 beta that allows you to save your searches (and to run them from a separate dialog if you want). You can even run a group of searches in order.

Some sample regexes are loaded if your list is empty (or if you import the examples/search_entries.ini file).

You can export and import entries. So it might be interesting if you post searches you might want to see in the default examples files, and also searches that others might want to import.

DiapDealer · 09-25-2012, 03:18 PM

Quote:

Originally Posted by meme

In case you haven't noticed, there is a new Search Editor in the 0.6.0 beta that allows you to save your searches (and to run them from a separate dialog if you want). You can even run a group of searches in order.

Some sample regexes are loaded if your list is empty (or if you import the examples/search_entries.ini file).

You can export and import entries. So it might be interesting if you post searches you might want to see in the default examples files, and also searches that others might want to import.

OK, this is me, swooning a bit.

I haven't had much time to play with the latest beta yet, but I can see I need to make the time.

Doitsu · 09-25-2012, 03:26 PM

Quote:

Originally Posted by meme

In case you haven't noticed, there is a new Search Editor in the 0.6.0 beta that allows you to save your searches (and to run them from a separate dialog if you want). You can even run a group of searches in order.

That's a cool feature that I actually missed. How about adding a Search Editor... button to the Find and Replace dialog?

meme · 09-25-2012, 04:21 PM

Quote:

Originally Posted by Doitsu

That's a cool feature that I actually missed. How about adding a Search Editor... button to the Find and Replace dialog?

Try the Tools Menu. All the fun stuff hides there

kiwidude · 09-25-2012, 05:09 PM

@meme - I am shocked you also forgot to suggest they try the right-click menu on the Find dropdown, to quickly recall a saved search or add the current one to the saved searches...

crutledge · 09-25-2012, 05:30 PM

Some ebooks capitalize for emphasis and some capitalize all proper names.

The following experssion easily finds all cap words in a file: (\w{Lu}+\w).
The problem is that it finds all caps to inclued those in headers and other places where caps are wanted.

I have been trying for some time to build a regex that will limit itself the those cap words between tags with no success.

Is there a way to do this?

Jellby · 09-26-2012, 03:56 AM

Doesn't this work?:

Code:

<p[> ].*(\w{Lu}+\w)

.*[/quote]

(it needs the dot not to match newlines, and it would only find one word per paragraph)

In similar cases, I often find it easier to mark someway the words I don't want to match by adding some otherwise unused character (¬ or | are good candidates), then it's easier to match what I do want to match, and I can remove the marking character easily at the end.

Doitsu · 09-26-2012, 03:58 AM

A quick and dirty solution would be:

Find:([[:upper:]]{2,})(.*?)
Replace:\L\1\E\2

This regular expression searches for uppercase words with at least two uppercase letters and will convert them to lower case italics. (For other case transformation examples see my other post).

Since this expression will only match one uppercase word per paragraph, you'll have to run it repeatedly if your paragraphs contain multiple uppercase words.
Theoretically, it might also miss some uppercase words or match more than one paragraph. I.e. don't use it with Replace All.

If this regular expression actually works for you, please do me a favor and upload a fewer books.

JMikeD · 09-27-2012, 06:51 PM

I have a numer of older books that have been through the OCR process and ended up with paragraph breaks in the middle of sentences. In Open Office, I could get almost al of these fixed by using a regex:

Find: \p([a-z])
Replace: \1\2

I don't seem to be able to get a similar function to work in the Find and Replace of Sigil. The HTML code looks like:

Quote:

bad policy to answer a

direct question. He kept shaking his head like a china figure.

I need to be able to glue sentences such as this back together. Any ideas?

Thanks.

Doitsu · 09-27-2012, 07:13 PM

I'm sure that there's a more elegant solution, but you could simply search for a paragraph ending in a lowercase letter or a punctuation sign followed by a paragraph starting with a lowercase letter and then join them with a space.

Code:

Find:([[:lower:]],*;*:*)</span></p>\n\n  <p class="calibre"><span>([[:lower:]])

Code:

Replace:\1 \2

(The regex assumes that Tidy is on and that there are two spaces before each .)

09-25-2012, 11:38 AM	#137
Jellby frumious Bandersnatch Posts: 7,590 Karma: 21743811 Join Date: Jan 2008 Location: Spaniard in Germany Device: Cybook Orizon, Kobo Aura	Yes, it does because it searches for "any character but >" inside the quotes, and that includes the closing quote and the href part. You probably want something like this: Code: <a class="([^"]?)" id="([^"]?)"></a> Anyway, an anchor with href and no text is pretty useless.

09-25-2012, 05:30 PM	#146
crutledge eBook FANatic Posts: 18,301 Karma: 16078357 Join Date: Apr 2008 Location: Alabama, USA Device: HP ipac RX5915 Wife's Kindle	Finding strings only contained in <p>....</p> Some ebooks capitalize for emphasis and some capitalize all proper names. The following experssion easily finds all cap words in a file: (\w{Lu}+\w). The problem is that it finds all caps to inclued those in headers and other places where caps are wanted. I have been trying for some time to build a regex that will limit itself the those cap words between <p> tags with no success. Is there a way to do this?

09-26-2012, 03:56 AM	#147
Jellby frumious Bandersnatch Posts: 7,590 Karma: 21743811 Join Date: Jan 2008 Location: Spaniard in Germany Device: Cybook Orizon, Kobo Aura	Doesn't this work?: Code: <p[> ].(\w{Lu}+\w) .</p>[/quote] (it needs the dot not to match newlines, and it would only find one word per paragraph) In similar cases, I often find it easier to mark someway the words I don't want to match by adding some otherwise unused character (¬ or \| are good candidates), then it's easier to match what I do want to match, and I can remove the marking character easily at the end.

09-26-2012, 03:58 AM	#148
Doitsu Grand Sorcerer Posts: 5,818 Karma: 24222221 Join Date: Dec 2010 Device: Kindle PW2	A quick and dirty solution would be: Find:([[:upper:]]{2,})(.?)</p> Replace:*<i>\L\1\E</i>\2</p> This regular expression searches for uppercase words with at least two uppercase letters and will convert them to lower case italics. (For other case transformation examples see my other post). Since this expression will only match one uppercase word per paragraph, you'll have to run it repeatedly if your paragraphs contain multiple uppercase words. Theoretically, it might also miss some uppercase words or match more than one paragraph. I.e. don't use it with Replace All. If this regular expression actually works for you, please do me a favor and upload a fewer books.

09-27-2012, 07:13 PM	#150
Doitsu Grand Sorcerer Posts: 5,818 Karma: 24222221 Join Date: Dec 2010 Device: Kindle PW2	I'm sure that there's a more elegant solution, but you could simply search for a paragraph ending in a lowercase letter or a punctuation sign followed by a paragraph starting with a lowercase letter and then join them with a space. Code: Find:([[:lower:]],;:)</span></p>\n\n <p class="calibre"><span>([[:lower:]]) Code: Replace:\1 \2 (The regex assumes that Tidy is on and that there are two spaces before each <p>.) Last edited by Doitsu; 09-27-2012 at 07:26 PM.*

09-25-2012, 03:00 PM	#141
meme Sigil developer Posts: 1,274 Karma: 1101600 Join Date: Jan 2011 Location: UK Device: Kindle PW, K4 NT, K3, Kobo Touch	In case you haven't noticed, there is a new Search Editor in the 0.6.0 beta that allows you to save your searches (and to run them from a separate dialog if you want). You can even run a group of searches in order. Some sample regexes are loaded if your list is empty (or if you import the examples/search_entries.ini file). You can export and import entries. So it might be interesting if you post searches you might want to see in the default examples files, and also searches that others might want to import.

09-25-2012, 05:09 PM	#145
kiwidude Calibre Plugins Developer Posts: 4,792 Karma: 2209340 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@meme - I am shocked you also forgot to suggest they try the right-click menu on the Find dropdown, to quickly recall a saved search or add the current one to the saved searches...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 06:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 03:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 08:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 03:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 04:23 AM