Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 04-19-2011, 08:33 AM   #1
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Regular expression search in authors

Hi there.

I am trying to find all the books in my calibre library that have the surname, then a comma, then the first name.

I have tried the following in the Advanced Search window. I select Regular Expression, and then in the drop down menu I select authors and then in the field I type:

[\w],\s[\w]

I have also tried [A-Za-Z], [A-Za-z]

and hit return. But this does not work.

Can someone please tell me how to achieve this?

Thank you.
kakkalla is offline   Reply With Quote
Old 04-19-2011, 09:52 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by kakkalla View Post
[\w],\s[\w]
You cannot use escape sequences in sets, IIRC. Also, you forgot to use quantifiers. Using
Code:
\w+,\s\w+
should do what you want.
Manichean is offline   Reply With Quote
Advert
Old 04-19-2011, 10:01 AM   #3
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,375
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Two issues:

- literal backslashes must be escaped with a backslash.
- the comma is treated strangely, because the 'real' character in the field is a '|'. This is a bug.

The following works today.
Code:
authors:"~^\\w*\\, \\w"
I will fix the strange comma problem. After the fix is released, then the following will work
Code:
authors:"~^\\w*, \\w"
chaley is offline   Reply With Quote
Old 04-19-2011, 10:31 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by chaley View Post
Two issues:

- literal backslashes must be escaped with a backslash.
- the comma is treated strangely, because the 'real' character in the field is a '|'. This is a bug.

The following works today.
Code:
authors:"~^\\w*\\, \\w"
If the 'real' character in the field is a '|', then why does that work?

Quote:
I will fix the strange comma problem. After the fix is released, then the following will work
Code:
authors:"~^\\w*, \\w"
Can you use upper case \W as the negated word character class or [A-Z]? My recollection is that caps are all forced to lower, but I'd like to verify that.
Starson17 is offline   Reply With Quote
Old 04-19-2011, 10:53 AM   #5
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,375
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Starson17 View Post
If the 'real' character in the field is a '|', then why does that work?
Because the comma in the query is changed to a '|' when searching authors. The fix for the bug is to change the comma to r'\|' if regexp mode is being used.
Quote:
Can you use upper case \W as the negated word character class or [A-Z]? My recollection is that caps are all forced to lower, but I'd like to verify that.
Query strings are not forced to lower in regexp mode. However, the regexp searches are caseless; ignore case is forced on.

Last edited by Starson17; 04-19-2011 at 11:15 AM.
chaley is offline   Reply With Quote
Advert
Old 04-19-2011, 11:20 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by chaley View Post
Because the comma in the query is changed to a '|' when searching authors. The fix for the bug is to change the comma to r'\|' if regexp mode is being used.
Got it.
Quote:
Query strings are not forced to lower in regexp mode. However, the regexp searches are caseless; ignore case is forced on.
So does that mean that \W as the negated word character class will work fine, but [A-Z] will match [a-z] characters in the ignore case search?
(Note: I edited your post to add a missing slash, so I could quote it.)

Last edited by Starson17; 04-19-2011 at 11:23 AM.
Starson17 is offline   Reply With Quote
Old 04-19-2011, 01:11 PM   #7
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,375
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Starson17 View Post
Got it.

So does that mean that \W as the negated word character class will work fine, but [A-Z] will match [a-z] characters in the ignore case search?
That is my expectation. Character classes that don't depend on case should work without surprise. Classes that do depend on case will match both cases, even if the class contains only one of them.
chaley is offline   Reply With Quote
Old 04-19-2011, 08:17 PM   #8
kakkalla
Zealot
kakkalla doesn't litterkakkalla doesn't litter
 
kakkalla's Avatar
 
Posts: 146
Karma: 194
Join Date: Jun 2010
Location: Melbourne
Device: iPad
Thanks so much!

This:

authors:"~^\\w*\\, \\w"

works like charm. Thanks again.

However, I am confused. Why the two backslashes before the w? I have a reference guide that says (quote):

\d, \w and \s are shorthand character classes ... can be used inside and outside character classes.

So my thinking was that I need "any letter", followed by a comma, then a space, then any letter. Hence I should be able to use \w*,\s\w*. So despite the bug about the comma, this should work, shouldn't it.

Last edited by kakkalla; 04-19-2011 at 08:31 PM.
kakkalla is offline   Reply With Quote
Old 04-20-2011, 03:22 AM   #9
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
If I understood the matter correctly, you need to escape the backslashes for the search parser in order for them to stay in place as a single backslash for the regex parser.
Manichean is offline   Reply With Quote
Old 04-20-2011, 04:24 AM   #10
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,375
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Manichean View Post
If I understood the matter correctly, you need to escape the backslashes for the search parser in order for them to stay in place as a single backslash for the regex parser.
You have it right.

Backslashing things is always a problem because the backslash is often used to 'escape' other special characters by each piece of the processing chain. In calibre, there are two such pieces: the search language parser and the search engine. Each will process backslashes.

The string passed to search is first processed by the search language parser. Backslashes are used as escapes: for example a \" means that the quote is part of the query and not the end of a query segment. Following fairly universal rules, all escaping backslashes are removed after processing. As such, \w is processed, determined to mean 'w', and the backslash is removed. The result is passed to the search engine, where additional escape processing is done, for example for regular expressions.

Because of the dual processing, if you want to pass a real backslash to the search engine, you must escape it using a doubled backslash. Thus \\w instead of \w.
chaley is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with regular expression search/replace bfollowell Sigil 12 06-20-2013 07:36 PM
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Search & Replace - Regular expression oldbwl Calibre 2 01-09-2011 09:33 AM
Regular Expression Help iKarampa Calibre 13 12-15-2010 07:17 AM
Regular expression help krendk Calibre 4 12-04-2010 04:32 PM


All times are GMT -4. The time now is 03:19 PM.


MobileRead.com is a privately owned, operated and funded community.