Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 10-12-2011, 05:52 PM   #1
mmholt
Member
mmholt began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2011
Device: Kindle 2
Help with a search & replace

I want to locate any authors with two or more initials with periods, each separated by a space, like "A. B." or "C. D. E."

I've worked out a regex that will find them, but I can't figure out how to remove the space between two initials. Help?
mmholt is offline   Reply With Quote
Old 10-13-2011, 04:53 AM   #2
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I don't understand what you want to remove. Could you post some examples of what values you currently have in your library and what you want to change them to?

Spaces can best be matched by using either a space or \s for general whitespace matching.
Manichean is offline   Reply With Quote
 
Advertisement
Old 10-13-2011, 04:56 AM   #3
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,793
Karma: 1212788
Join Date: Jan 2010
Location: France
Device: Many android devices
Search field: authors
Search for: (\w\.) (?=\w\.)
Replace with: \1
chaley is offline   Reply With Quote
Old 10-14-2011, 02:16 PM   #4
mmholt
Member
mmholt began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2011
Device: Kindle 2
Quote:
Originally Posted by Manichean View Post
I don't understand what you want to remove. Could you post some examples of what values you currently have in your library and what you want to change them to?
If an author's name contains a string like "A. A. A." or "A. A." I wanted to replace those with "A.A.A." or "A.A."


Quote:
Originally Posted by chaley View Post
Search field: authors
Search for: (\w\.) (?=\w\.)
Replace with: \1
That does exactly what I wanted - thank you very much. But I don't understand why it works. Enlighten me, please?
mmholt is offline   Reply With Quote
Old 10-14-2011, 03:22 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mmholt View Post
Enlighten me, please?
Code:
Search for: (\w\.) (?=\w\.)
Replace with: \1
\w is a single word character (like a letter)
\w\. is a single word character followed by a period (the \. means a period, while the dot alone without the escape backslash is a wild card for any character.)
So "(\w\.) " means a single word character followed by a period followed by a space (note the space there).

(?=\w\.) means to only find "a single word character followed by a period followed by a space" if the space is followed by "a single word character followed by a period". The pattern (?= is a positive lookahead assertion. It lets the preceding match only when the following matches, but the lookahead part doesn't "eat up" any of the string.

For example, the regex "Isaac (?=Asimov)" will match "Isaac " only if it’s followed by "Asimov".

Last edited by Starson17; 10-14-2011 at 03:27 PM.
Starson17 is offline   Reply With Quote
Old 10-14-2011, 08:31 PM   #6
mmholt
Member
mmholt began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2011
Device: Kindle 2
Thanks for all the details. I still don't understand the "Replace with". How does that remove the space?
mmholt is offline   Reply With Quote
Old 10-14-2011, 09:36 PM   #7
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,071
Karma: 5939999
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by mmholt View Post
Thanks for all the details. I still don't understand the "Replace with". How does that remove the space?
I think that should be (a back reference for each match)
\1\2

Since the spac between other match elements is not in side a reference (), it will be lost when replacing only the 2 back references.
theducks is offline   Reply With Quote
Old 10-14-2011, 10:21 PM   #8
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 7,183
Karma: 45264785
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
Quote:
Originally Posted by theducks View Post
I think that should be (a back reference for each match)
\1\2

Since the spac between other match elements is not in side a reference (), it will be lost when replacing only the 2 back references.
Actually, I think the way it works is that the entire regex ONLY matches the first occurence of a single word character followed by a period and space.

Since the (?=\w\.) is as chaley says "(?= is a positive lookahead assertion. It lets the preceding match only when the following matches, but the lookahead part doesn't "eat up" any of the string." this means the only characters "consumed" by the reg ex. are the initial sequence "(\w\.) " and that is replaced by the (1) which is that initial \w\. sequence.
PeterT is offline   Reply With Quote
Old 10-14-2011, 10:23 PM   #9
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,885
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by theducks View Post
I think that should be (a back reference for each match)
\1\2
Since he already stated that the S&R worked flawlessly I'm guessing the \2 isn't required.

Quote:
Originally Posted by theducks View Post
Since the space between other match elements is not in side a reference (), it will be lost when replacing only the 2 back references.
Good explanation.
DoctorOhh is offline   Reply With Quote
Old 10-15-2011, 04:07 AM   #10
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,793
Karma: 1212788
Join Date: Jan 2010
Location: France
Device: Many android devices
Quote:
Originally Posted by theducks View Post
I think that should be (a back reference for each match)
\1\2
No. The second parenthesized expression (the positive lookahead) does not create a group that can be back referenced. Adding the \2 will generate an error, because there is only one group.

Quote:
Originally Posted by PeterT View Post
Actually, I think the way it works is that the entire regex ONLY matches the first occurence of a single word character followed by a period and space.
It matches N occurrences of "letter dot space" -- an "initial". What it does not match is the last initial, preventing removing the space between that last initial and the following word.
Quote:
Since the (?=\w\.) is as starson says "(?= is a positive lookahead assertion. It lets the preceding match only when the following matches, but the lookahead part doesn't "eat up" any of the string." this means the only characters "consumed" by the reg ex. are the initial sequence "(\w\.) " and that is replaced by the (1) which is that initial \w\. sequence.
One thing to remember: matching and substitution in calibre's search/replace (and generally in regular expressions) is leftmost non-overlapping. This means that the expression will operate on the first string that matches, then start again at the left side of what remains. Because the lookahead assertion does not consume characters, what "remains" is the next initial, and the regexp process is run again on that initial and whatever follows it. This process repeats until the expression fails to match something, which will happen when there are no remaining initials followed by an initial.

Note that "leftmost-overlapping" does not imply either "adjacent" or "leading". It simply means that the input string is scanned from left to right. For example: regarding adjacent, there is no requirement that there be only one set of initials. Given the rather bizarre author name "A. B. Someword C. D. Lastname", the expression will match the A. and the C., resulting in "A.B. Someword C.D. Lastname". Regarding leading: the name "Joe A. B. Smith" will be changed to "Joe A.B. Smith".
chaley is offline   Reply With Quote
Old 10-17-2011, 10:10 AM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mmholt View Post
Thanks for all the details. I still don't understand the "Replace with". How does that remove the space?
You got lots of excellent information, but in case the answer to this question wasn't clear: It removes the space because there is a space after "(\w\.)" in the "Search for" part. That means the space and the word character (followed by period) will all be "eaten up" or as chaley correctly put it "consumed" by the search. Of course, those three characters will only be eaten up if the positive lookahead assertion is matched (another (\w\.) follows the first one.) However the "Replace with" part doesn't have a space. It has just a match for the group of two characters - (\w\.) - word character followed by period. So the three character string: "word character-period-space" that is consumed (subject to the lookahead) gets replaced with a two character string that is the same as the three character string, minus the space. As chaley said, the process then starts again.

Simple
Starson17 is offline   Reply With Quote
Old 10-21-2011, 07:49 PM   #12
mmholt
Member
mmholt began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2011
Device: Kindle 2
Thank you all for the awesome replies. It was all very helpful!
mmholt is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search & Replace :help: krussell Calibre 3 08-02-2011 05:45 PM
Search & Replace/Regex help!! millertime13 Conversion 4 07-22-2011 03:40 AM
Search & Replace Suggestion Philosopher Calibre 6 12-31-2010 12:55 PM
Search & Replace Pat Nickholds Sigil 2 10-22-2010 12:18 AM
Search & replace TEXT ToeRag Calibre 3 04-10-2010 02:44 PM


All times are GMT -4. The time now is 04:22 PM.


MobileRead.com is a privately owned, operated and funded community.