Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2012, 08:10 PM   #1
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
restricting regex to single lines of code?

Not too long ago I asked a question about how do the opposite of this. Naturally now I need to now how to do the opposite of that...

I'm currently working on a document in which all double-quotations marks have been replaced with a question mark. Obviously, I'm trying to undo that. The way i had planned on doing that is by search for strings that begin with ? and end with two consecutive punctuation marks, the latter of which also being ? (i.e., .? or !? or ,? or ??).
The search I was using was:
Code:
\?([^\.]*)\.\?
I was then going to do subsequent searches with different punctuation marks in between the brackets. Unfortunately, I never got that far because the above search was being too greedy.

example:
Code:
  <p>?Or the television reports??</p>

  <p>?No.?</p>
for the above text, the search matches "??</p> <p>?No.?" instead of just "?No.?"

How do I do this right? (PS - using "<" as a marker won't work because not all dialogue finishes at the end of a paragraph ---> eg. <p>"Let's get out of here!" he yelled.</p>)

Last edited by ElMiko; 01-28-2012 at 12:18 AM.
ElMiko is offline   Reply With Quote
Old 01-26-2012, 08:27 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ElMiko View Post
Not to long ago I asked a question about how do the opposite of this. Natrually now I need to now how to do the opposite of that...

I'm currently working on a document in which all double-quotations marks have been replaced with a question mark. Obviously, I'm trying to undo that. The way i had planned on doing that is by search for strings that begin with ? and end with two consecutive punctuation marks, the latter of which also being ? (i.e., .? or !? or ,? or ??).
The search I was using was:
Code:
\?([^\.]*)\.\?
I was then going to do subsequent searches with different punctuation marks in between the brackets. Unfortunately, I never got that far because the above search was being too greedy.

example:
Code:
  <p>?Or the television reports??</p>

  <p>?No.?</p>
for the above text, the search matches "??</p> <p>?No.?" instead of just "?No.?"

How do I do this right? (PS - using "<" as a marker won't work because not all dialogue finishes at the end of a paragraph ---> eg. <p>"Let's get out of here!" he yelled.</p>)
Did you remember to escape things?
(\!\?|\?\?|\.\?) just add all the escaped combinations you are looking for, separated by a pipe
theducks is offline   Reply With Quote
Old 01-26-2012, 08:58 PM   #3
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Yeah, I mean, that's how my search is set up right now (as per the example), but as I say, it's being too greedy in what it matches.
ElMiko is offline   Reply With Quote
Old 01-26-2012, 09:38 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ElMiko View Post
Yeah, I mean, that's how my search is set up right now (as per the example), but as I say, it's being too greedy in what it matches.
End it with a *? outside the ()
Is it possible you are trying to do this in just 1 pass?
I would try for little bites , the come back for more
theducks is offline   Reply With Quote
Old 01-26-2012, 10:09 PM   #5
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by theducks View Post
End it with a *? outside the ()
Is it possible you are trying to do this in just 1 pass?
I would try for little bites , the come back for more
Haha, no it isn't possible! Quite the opposite. As I said, I was planning on doing 4 separate searches for each punctuation combination.

So what you're suggesting in order to find:
?No.?
in my initial example is:
Code:
\?([^\.]*)*?\.\?
That expression doesn't match any strings.
ElMiko is offline   Reply With Quote
Old 01-26-2012, 10:31 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
For trailing with Punct inside the quotes

Search
Code:
(\!|\.|\,|\?)\?+?
Replace
Code:
\1"
For Leading
Code:
\?([A-Za-z])+?
Code:
"\1
theducks is offline   Reply With Quote
Old 01-26-2012, 11:23 PM   #7
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
My guess would be something like this, tho I'm 90% asleep and off to bed... haven't tested it, but should work in most cases.
Code:
\?(\w[^?]+[[:punct:]])\?
replace : “\1”
One of those times you really need to grep/list all the results to make sure.
Serpentine is offline   Reply With Quote
Old 01-26-2012, 11:41 PM   #8
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Thank you both!

How do i list all results?
ElMiko is offline   Reply With Quote
Old 01-27-2012, 10:03 AM   #9
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
You can't at the moment within Sigil, I use my shells regex module to check inside the .epub... not too sure what the easiest method would be in Windows.
Serpentine is offline   Reply With Quote
Old 01-27-2012, 02:41 PM   #10
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
I don't sigil can do that. You can do this in linux system like Serpentine said.

Last edited by congngo; 01-27-2012 at 02:58 PM. Reason: newer
congngo is offline   Reply With Quote
Old 01-27-2012, 05:50 PM   #11
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Ah, i see. Thanks for the headsup.

Part of my confusion is that I found myself in a position earlier where I had to start the search string with (?s) in order to make it match more than one line of code. But now when I want to restrict the search to a single line of code, it automatically includes multiple lines! What gives? Am I just misunderstanding the mechanics of reg-ex searches?
ElMiko is offline   Reply With Quote
Old 01-27-2012, 07:34 PM   #12
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
because version 0.4.2 use QRegExp (regular expression engine) and version 0.5.0 use PCRE. It was explained earlier by user_none. PCRE is better but have different syntax.
congngo is offline   Reply With Quote
Old 01-27-2012, 09:25 PM   #13
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
(?s) is 'single line', it evaluates everything as a single line - as such .'s are not restricted to a single line, they will wrap. However if you are explicitly looking for \s's, those will also wrap around if you were not using single line, as they match the line break [\n\r].

In multiline (?m), you can use multiple [^$] to match the stard/end of lines, rather than the whole string.

As always, check out http://www.pcre.org/pcre.txt
It's surprisingly easy to read, just search around for a good starting point.
Serpentine is offline   Reply With Quote
Old 01-27-2012, 11:23 PM   #14
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 502
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by Serpentine View Post
(?s) is 'single line', it evaluates everything as a single line - as such .'s are not restricted to a single line, they will wrap. However if you are explicitly looking for \s's, those will also wrap around if you were not using single line, as they match the line break [\n\r].

In multiline (?m), you can use multiple [^$] to match the stard/end of lines, rather than the whole string.

As always, check out http://www.pcre.org/pcre.txt
It's surprisingly easy to read, just search around for a good starting point.

EDIT: WAAAAIT a minute. So are you saying that .* doesn't include \s, \n, or \r, but [^\.](for example) does? In other words, searches that use . match everything except \n \s and \r, whereas searches that use ^ match every value (including \n, \r, and \s) except the value that follows it? I think (i hope) this is becoming slightly clearer. So then is there an expression that would search for the kind of string i'm looking for now but restrict the search to a single line of code? ie maybe something that uses ^ to negate \n values?

Last edited by ElMiko; 01-28-2012 at 12:38 AM.
ElMiko is offline   Reply With Quote
Old 01-28-2012, 04:39 PM   #15
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Let us use the following test text:
Code:
This is an example paragraph of text
it is not very long, nor very correct.

But it should be enough.
Using the default of "dot does not match newline"
Consider the expression: .+
You will have three matches :
Code:
1(This is an example paragraph of text)
2(it is not very long, nor very correct.)

3(But it should be enough.)
Now consider the expression: .+\s+.+
You will have two matches:
Code:
1(This is an example paragraph of text)
it is not very long, nor very correct.)

2(But it should be enough.)
This is caused by searching explicitly for \s, which does match newline. Remember that the default in this case was that dot does NOT match newline. To allow dot to match newline, we use (?s).

If we consider: (?s).*
You will have one match:
Code:
1(This is an example paragraph of text)
it is not very long, nor very correct.

But it should be enough.)
So far it's simple enough, however it does not show the reason why I'm making sure that you take note of the \s matches specifically. A lot of expressions will need you to use \s+ or similar, however this will allow you to escape the 'single line', which is bad.

This is caused because by default the searched string is treated as a single long line. This means that it's effectively seens as :
[code]^This is an example paragraph of text\r\nit is not very long, nor very correct\.\r\n\r\nBut it should be enough\.$[code]

\s is going to match those \r and \n always. So, you need to be pretty careful with \s's either way, dot matches or not. Which is why there is multiline matching, which means that the anchors in the above text are moved back to their logical positions, rather than being at the start and end of the whole string, they will now match at the start and end of each line. Making it look more like :
[code]^This is an example paragraph of text$\r\n^it is not very long, nor very correct\.$\r\n\r\n^But it should be enough\.$[code]

So that we can more accuratly evaluate lines, for example - let us match a line, and the following line which starts with "it is not": (?m)^(.+)\s+^(it is not.+)$
Code:
1/1(This is an example paragraph of text)
1/2(it is not very long, nor very correct.)

But it should be enough.
1/2 being (first match, group 2)

True to the line restriction, there would not be a match if it were searched for it in:
Code:
This is an example paragraph of text it is not very long, nor very correct.

But it should be enough.
Serpentine is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Restricting font size maxivittor ePub 3 09-16-2011 12:15 PM
Restricting the book list to the results fartang Library Management 3 05-15-2011 10:13 AM
Use Regex to Code an Inline TOC, from an External TOC's .ncx File mostlynovels ePub 2 03-16-2011 12:15 PM
restricting write access for calibre Dopedangel Calibre 9 02-26-2010 09:55 AM
PRS-600 Joined source code lines in pdf ldwedari Sony Reader 2 09-14-2009 04:03 AM


All times are GMT -4. The time now is 04:28 PM.


MobileRead.com is a privately owned, operated and funded community.