Regex examples - Page 9

Doitsu · 08-11-2012, 10:44 AM

A quick and dirty solution would be:

Find: (chapter) ([[:lower:]]+)
Replace: \u\1 \u\2

This requires Sigil 0.5.3 (or higher).

Gunnerp245 · 08-11-2012, 10:03 PM

Quote:

Originally Posted by Gunnerp245

I would like to change the capitalization a particular phrase across a book e.g. chapter one to Chapter One. I can detect the instances using (\D+) (\D+) and know the replacement would be \1 \2, but not how to change the capitalization.

Quote:

Originally Posted by Doitsu

A quick and dirty solution would be:
Find: (chapter) ([[:lower:]]+)
Replace: \u\1 \u\2
This requires Sigil 0.5.3 (or higher).

@Doitsu - Left out key information initially, I am attempting this is calibre 'search and replace'.

.

Doitsu · 08-12-2012, 03:35 AM

Quote:

Originally Posted by Gunnerp245

@Doitsu - Left out key information initially, I am attempting this is calibre 'search and replace'.

.

Starting with version 0.5.3, Sigil uses Perl compatible regular expressions (PCRE) including the \u operator (which capitalizes the first letter of a string).
AFAIK, Calibre uses the Python regular expression library, which doesn't support the \u operator.
The expressions that I suggested will work in Sigl or any text editor with PCRE support.
Is there any particular reason why want to use Calibre to replace the text?

Jellby · 08-12-2012, 04:40 AM

Quote:

Originally Posted by Doitsu

Is there any particular reason why want to use Calibre to replace the text?

And is there any particular reason why you ask a Calibre question in the Sigil forum?

Gunnerp245 · 08-12-2012, 10:51 AM

@Doitsu/Jellby

I was reading through the regex sticky and posted my question before realizing which software forum I was in. I have been able to glean very helpful information. Though the other forum has a regex sticky it does not seem as detailed as this one.

I have moved my query here.

Greygor · 08-23-2012, 10:55 AM

I hate my first post to be a question rather than an answer but needs must when the devil drives.

I have an epub where speech quotes are missing from the start of the line

e.g.

Quote:

The gets one," ff 11

So my first task was finding an expression that would find lines where this occurs.

This seems to work (I know there are cases where it fail, but I'm just finding and not auto-fixing)

Quote:

"\>[^"](\w*\W*)*"

However when I try it in Sigil it completely fails.

Is there something special about Sigil's regex that I'm overlooking?

Many thanks in advance

Doitsu · 08-23-2012, 01:11 PM

IMHO, the problem is \w*\W*, which matches a sequence of 0 or more word characters followed by 0 or more non-word characters. I.e., it will at most match one word plus a space or punctuation character. Try .*? instead:

Code:

"\>[^"](.*?)"

Greygor · 08-23-2012, 01:22 PM

Quote:

Originally Posted by Doitsu

IMHO, the problem is \w*\W*, which matches a sequence of 0 or more word characters followed by 0 or more non-word characters. I.e., it will at most match one word plus a space or punctuation character. Try .*? instead:

Code:

"\>[^"](.*?)"

Hi thanks for the response.

Outside of Sigil the regex that I was using worked fine which is what I'm finding odd.

Using

Quote:

"\>[^"](.*?)"

lets me find lots of

Quote:

"><span class="

but unfortunately that's not quite right, but at least it finds something in Sigil which mine doesn't

Need to think about this, I'm missing something really obvious

DiapDealer · 08-23-2012, 01:50 PM

Quote:

Originally Posted by Greygor

Outside of Sigil the regex that I was using worked fine which is what I'm finding odd.

Not really that odd. There are several different regex engines that all have subtle differences. So the questions would be: what application are you using where your original regex does succeed? And what version of Sigil are you using where it doesn't succeed?

paulfiera · 09-25-2012, 10:01 AM

Most surely, I'm not understanding this the right way.

I'm cleaning up some epubs and have noticed that some of them have anchor tags with a class and an id but without any hyperlink. Some epubs have several hundred in between the text.

So I'm using this regex to find anchor links with nothing inside them

Quote:

The problem is that it also finds these tags with spans and even text inside.

I would like to be able to restrict the findings to only this situation.

Many thanks!

Doitsu · 09-25-2012, 10:41 AM

Quote:

Originally Posted by paulfiera

The problem is that it also finds these tags with spans and even text inside.

Your regex looks fine to me. Can you post some specific examples of unwanted html tags matched by your regex and the Sigil version that you're using?

DiapDealer · 09-25-2012, 10:47 AM

I'm not certain why that expression would match instances with spans or text inside the anchor tags. It shouldn't really.

You might try:

Code:

<a class="([^>]*?)" id="([^>]*?)"></a>

instead ... just to check.

But I can't get your expression to misbehave, really. It seems to do (for me anyway) what you've intended it to do. Can you give any examples of code it's matched that you don't think it should match?

theducks · 09-25-2012, 10:48 AM

Quote:

Originally Posted by paulfiera

Most surely, I'm not understanding this the right way.

I'm cleaning up some epubs and have noticed that some of them have anchor tags with a class and an id but without any hyperlink. Some epubs have several hundred in between the text.

So I'm using this regex to find anchor links with nothing inside them

The problem is that it also finds these tags with spans and even text inside.

I would like to be able to restrict the findings to only this situation.

Many thanks!

Your REGEX is fine as Doitsu said. It is also overly broad (second term) ;)
which is why it is matching </a>

if your id has an ending numbers use that to narrow the scope:(.+?\d+)"></a>

Jellby · 09-25-2012, 11:40 AM

Quote:

Originally Posted by DiapDealer

But I can't get your expression to misbehave, really. It seems to do (for me anyway) what you've intended it to do. Can you give any examples of code it's matched that you don't think it should match?

<a class="whatever" href="#here">this is a link</a> <a class="other" id="something"></a>

The whole red part would be matched by the first (.*?), right?

paulfiera · 09-25-2012, 11:59 AM

Thanks, Doitsu and DiapDealer

This is from Clive Barker's Imajica

Using

Quote:

and clicking on Count All, it finds 18 matches.

Clicking on Find, the first match is this one:

Quote:

<a class="calibre16" href="../Text/Imajica_split_211.html#filepos2489718" id="filepos69564">William</a>—and they had only argued once, but it had been a telling exchange. She’d accused him of always looking at other women; looking, looking, as though for the next conquest. Perhaps because he didn’t care for her too much, he’d replied honestly and told her she was right. He was stupid for her sex. Sickened in their absence, blissful in their company: love’s fool. She’d replied that while his obsession might be healthier than her husband’s—which was money and its manipulation—his behavior was still neurotic. Why this endless hunt? she’d asked him. He’d answered with some folderol about seeking the idealwoman, but he’d known the truth even as he was spinning her this tosh, and it was a bitter thing. Too bitter, in fact, to be put on his tongue. In essence, it came down to <a class="calibre22"></a>

Clicking on Find again, it matches this one:

Quote:

<a class="calibre16" href="../Text/Imajica_split_199.html#filepos2416389" id="filepos73127">Gloriana</a>, one of his five cats, escaped in search of a mate. “Too slow, sweetie!” he told her. She yowled at him in complaint. “I keep her fat so she’s slow,” he said. “And I don’t feel so piggy myself.”<a class="calibre22"></a>

This is on Sigil 0.5.3

Strange.

08-11-2012, 10:44 AM	#121
Doitsu Grand Sorcerer Posts: 5,795 Karma: 24088595 Join Date: Dec 2010 Device: Kindle PW2	A quick and dirty solution would be: Find: (chapter) ([[:lower:]]+) Replace: \u\1 \u\2 This requires Sigil 0.5.3 (or higher). Last edited by Doitsu; 08-11-2012 at 10:56 AM.

08-12-2012, 10:51 AM	#125
Gunnerp245 Gadget Freak Posts: 1,169 Karma: 1043832 Join Date: Nov 2007 Location: US Device: EE, Note 8	@Doitsu/Jellby I was reading through the regex sticky and posted my question before realizing which software forum I was in. I have been able to glean very helpful information. Though the other forum has a regex sticky it does not seem as detailed as this one. I have moved my query here. Last edited by Gunnerp245; 08-12-2012 at 11:29 AM.

09-25-2012, 10:47 AM	#132
DiapDealer Grand Sorcerer Posts: 29,134 Karma: 211348980 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	I'm not certain why that expression would match instances with spans or text inside the anchor tags. It shouldn't really. You might try: Code: <a class="([^>]?)" id="([^>]?)"></a> instead ... just to check. But I can't get your expression to misbehave, really. It seems to do (for me anyway) what you've intended it to do. Can you give any examples of code it's matched that you don't think it should match?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 07:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 04:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 09:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 04:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 05:23 AM

08-23-2012, 01:11 PM	#127
Doitsu Grand Sorcerer Posts: 5,795 Karma: 24088595 Join Date: Dec 2010 Device: Kindle PW2	IMHO, the problem is \w\W, which matches a sequence of 0 or more word characters followed by 0 or more non-word characters. I.e., it will at most match one word plus a space or punctuation character. Try *.?** instead: Code: "\>[^"](.*?)"