MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Regex examples (https://www.mobileread.com/forums/showthread.php?t=167971)

Doitsu 08-11-2012 10:44 AM

A quick and dirty solution would be:

Find: (chapter) ([[:lower:]]+)
Replace: \u\1 \u\2

This requires Sigil 0.5.3 (or higher).

Gunnerp245 08-11-2012 10:03 PM

Quote:

Originally Posted by Gunnerp245 (Post 2181500)
I would like to change the capitalization a particular phrase across a book e.g. chapter one to Chapter One. I can detect the instances using (\D+) (\D+) and know the replacement would be \1 \2, but not how to change the capitalization.

Quote:

Originally Posted by Doitsu (Post 2181542)
A quick and dirty solution would be:
Find: (chapter) ([[:lower:]]+)
Replace: \u\1 \u\2
This requires Sigil 0.5.3 (or higher).

@Doitsu - Left out key information initially, I am attempting this is calibre 'search and replace'.:o.

Doitsu 08-12-2012 03:35 AM

Quote:

Originally Posted by Gunnerp245 (Post 2182012)
@Doitsu - Left out key information initially, I am attempting this is calibre 'search and replace'.:o.

Starting with version 0.5.3, Sigil uses Perl compatible regular expressions (PCRE) including the \u operator (which capitalizes the first letter of a string).
AFAIK, Calibre uses the Python regular expression library, which doesn't support the \u operator.
The expressions that I suggested will work in Sigl or any text editor with PCRE support.
Is there any particular reason why want to use Calibre to replace the text?

Jellby 08-12-2012 04:40 AM

Quote:

Originally Posted by Doitsu (Post 2182180)
Is there any particular reason why want to use Calibre to replace the text?

And is there any particular reason why you ask a Calibre question in the Sigil forum? ;)

Gunnerp245 08-12-2012 10:51 AM

@Doitsu/Jellby

I was reading through the regex sticky and posted my question before realizing which software forum I was in. I have been able to glean very helpful information. Though the other forum has a regex sticky it does not seem as detailed as this one.

I have moved my query here.

Greygor 08-23-2012 10:55 AM

I hate my first post to be a question rather than an answer but needs must when the devil drives.

I have an epub where speech quotes are missing from the start of the line

e.g.

Quote:

<span class="bold calibre4">The gets one," ff 11
So my first task was finding an expression that would find lines where this occurs.

This seems to work (I know there are cases where it fail, but I'm just finding and not auto-fixing)

Quote:

"\>[^"](\w*\W*)*"
However when I try it in Sigil it completely fails.

Is there something special about Sigil's regex that I'm overlooking?

Many thanks in advance

Doitsu 08-23-2012 01:11 PM

IMHO, the problem is \w*\W*, which matches a sequence of 0 or more word characters followed by 0 or more non-word characters. I.e., it will at most match one word plus a space or punctuation character. Try .*? instead:

Code:

"\>[^"](.*?)"

Greygor 08-23-2012 01:22 PM

Quote:

Originally Posted by Doitsu (Post 2195154)
IMHO, the problem is \w*\W*, which matches a sequence of 0 or more word characters followed by 0 or more non-word characters. I.e., it will at most match one word plus a space or punctuation character. Try .*? instead:

Code:

"\>[^"](.*?)"

Hi thanks for the response.

Outside of Sigil the regex that I was using worked fine which is what I'm finding odd.

Using
Quote:

"\>[^"](.*?)"
lets me find lots of
Quote:

"><span class="
but unfortunately that's not quite right, but at least it finds something in Sigil which mine doesn't :)

Need to think about this, I'm missing something really obvious :smack:

DiapDealer 08-23-2012 01:50 PM

Quote:

Originally Posted by Greygor (Post 2195168)
Outside of Sigil the regex that I was using worked fine which is what I'm finding odd.

Not really that odd. There are several different regex engines that all have subtle differences. So the questions would be: what application are you using where your original regex does succeed? And what version of Sigil are you using where it doesn't succeed?

paulfiera 09-25-2012 10:01 AM

Strange issue
 
Most surely, I'm not understanding this the right way.

I'm cleaning up some epubs and have noticed that some of them have anchor tags with a class and an id but without any hyperlink. Some epubs have several hundred in between the text.

So I'm using this regex to find anchor links with nothing inside them

Quote:

<a class="(.*?)" id="(.*?)"></a>
The problem is that it also finds these tags with spans and even text inside.

I would like to be able to restrict the findings to only this situation.

Many thanks!

Doitsu 09-25-2012 10:41 AM

Quote:

Originally Posted by paulfiera (Post 2236440)
The problem is that it also finds these tags with spans and even text inside.

Your regex looks fine to me. Can you post some specific examples of unwanted html tags matched by your regex and the Sigil version that you're using?

DiapDealer 09-25-2012 10:47 AM

I'm not certain why that expression would match instances with spans or text inside the anchor tags. It shouldn't really.

You might try:
Code:

<a class="([^>]*?)" id="([^>]*?)"></a>
instead ... just to check.

But I can't get your expression to misbehave, really. It seems to do (for me anyway) what you've intended it to do. Can you give any examples of code it's matched that you don't think it should match?

theducks 09-25-2012 10:48 AM

Quote:

Originally Posted by paulfiera (Post 2236440)
Most surely, I'm not understanding this the right way.

I'm cleaning up some epubs and have noticed that some of them have anchor tags with a class and an id but without any hyperlink. Some epubs have several hundred in between the text.

So I'm using this regex to find anchor links with nothing inside them



The problem is that it also finds these tags with spans and even text inside.

I would like to be able to restrict the findings to only this situation.

Many thanks!

Your REGEX is fine as Doitsu said. It is also overly broad (second term) ;)
which is why it is matching </span></a>

if your id has an ending numbers use that to narrow the scope:(.+?\d+)"></a>

Jellby 09-25-2012 11:40 AM

Quote:

Originally Posted by DiapDealer (Post 2236521)
But I can't get your expression to misbehave, really. It seems to do (for me anyway) what you've intended it to do. Can you give any examples of code it's matched that you don't think it should match?

<a class="whatever" href="#here">this is a link</a> <a class="other" id="something"></a>

The whole red part would be matched by the first (.*?), right?

paulfiera 09-25-2012 11:59 AM

Thanks, Doitsu and DiapDealer

This is from Clive Barker's Imajica

Using

Quote:

<a class="(.*?)" id="(.*?)"></a>
and clicking on Count All, it finds 18 matches.

Clicking on Find, the first match is this one:

Quote:

<a class="calibre16" href="../Text/Imajica_split_211.html#filepos2489718" id="filepos69564"><span class="calibre17">William</span></a>—and they had only argued once, but it had been a telling exchange. She’d accused him of always looking at other women; looking, looking, as though for the next conquest. Perhaps because he didn’t care for her too much, he’d replied honestly and told her she was right. He was stupid for her sex. Sickened in their absence, blissful in their company: love’s fool. She’d replied that while his obsession might be healthier than her husband’s—which was money and its manipulation—his behavior was still neurotic. Why this endless hunt? she’d asked him. He’d answered with some folderol about seeking the idealwoman, but he’d known the truth even as he was spinning her this tosh, and it was a bitter thing. Too bitter, in fact, to be put on his tongue. In essence, it came down to <a class="calibre22"></a>
Clicking on Find again, it matches this one:


Quote:

<a class="calibre16" href="../Text/Imajica_split_199.html#filepos2416389" id="filepos73127"><span class="calibre17">Gloriana</span></a>, one of his five cats, escaped in search of a mate. “Too slow, sweetie!” he told her. She yowled at him in complaint. “I keep her fat so she’s slow,” he said. “And I don’t feel so piggy myself.”<a class="calibre22"></a>
This is on Sigil 0.5.3

Strange.


All times are GMT -4. The time now is 07:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.