Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2017, 08:07 AM   #1
calmeilles
Junior Member
calmeilles began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2017
Device: Kindle
Small suggestion for the search regex documentation

I was trying to search with the regex "[A-Z][A-Z][A-Z]" — looking for 3 consecutive capital letters but the search operated case-insensitive so I was getting results ABC, AbC, abc etc which came as a considerable surprise.

I believe that this is because my install has LOCALE=en_GB and the collate order for that is case insensitive. I may be wrong, but it's my best guess and actually doesn't matter.*

The Regex documentation page includes how to make a case sensitive [which is what we'd normally expect] ignore case with the "(?i)" syntax but not how to do the reverse. It actually turned out quite a chore tracking down what was required and a note in the documentation I feel would be useful.

What I ended up with was

Code:
(?-i:[A-Z]{3})
It's the possibility of -i that's missing and was quite obscure even in the Python docs.

(*I am curious if this is true or something else caused it.
If my guess is right then also mentioning that LOCALE can serious affect your regexes would also be useful.)
calmeilles is offline   Reply With Quote
Old 11-25-2017, 09:50 AM   #2
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by calmeilles View Post
I was trying to search with the regex "[A-Z][A-Z][A-Z]" — looking for 3 consecutive capital letters but the search operated case-insensitive so I was getting results ABC, AbC, abc etc which came as a considerable surprise.

I believe that this is because my install has LOCALE=en_GB and the collate order for that is case insensitive. I may be wrong, but it's my best guess and actually doesn't matter.*

The Regex documentation page includes how to make a case sensitive [which is what we'd normally expect] ignore case with the "(?i)" syntax but not how to do the reverse. It actually turned out quite a chore tracking down what was required and a note in the documentation I feel would be useful.

What I ended up with was

Code:
(?-i:[A-Z]{3})
It's the possibility of -i that's missing and was quite obscure even in the Python docs.

(*I am curious if this is true or something else caused it.
If my guess is right then also mentioning that LOCALE can serious affect your regexes would also be useful.)
If you are using the calibre ebook editor then all you needed to do was check the box for case sensitive.

bernie
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2017-11-25 09-49-27.png
Views:	237
Size:	99.5 KB
ID:	160220   Click image for larger version

Name:	Screenshot from 2017-11-25 09-59-03.png
Views:	214
Size:	88.2 KB
ID:	160222  

Last edited by gbm; 11-25-2017 at 10:00 AM. Reason: second screenshot added
gbm is offline   Reply With Quote
Advert
Old 11-25-2017, 11:15 AM   #3
calmeilles
Junior Member
calmeilles began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2017
Device: Kindle
Quote:
Originally Posted by gbm View Post
If you are using the calibre ebook editor then all you needed to do was check the box for case sensitive.
Er... I'm downright dumbfounded that that check box should affect a REGEX search. It's utility for 'Normal' searches is obvious, but it never even crossed my mind that it would affect a REGEX... that's just... silly!

Even so, I still think noting (?-i) where (?i) is mentioned in the doc would be a nice thing to do.
calmeilles is offline   Reply With Quote
Old 11-25-2017, 01:18 PM   #4
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,206
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by calmeilles View Post
that's just... silly!
... or perhaps it's designed to be more user-friendly for those with little or no experience of regex flags.
jackie_w is offline   Reply With Quote
Old 11-25-2017, 03:54 PM   #5
calmeilles
Junior Member
calmeilles began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2017
Device: Kindle
Quote:
Originally Posted by jackie_w View Post
... or perhaps it's designed to be more user-friendly for those with little or no experience of regex flags.
Okay, I've given that some thought.

I don't think that anyone who had encountered regexes before would expect them to ignore case; it's one of the first things you learn about them — they're precise (often in frustrating and unexpected ways, but that's a different subject).

Anyone not familiar with them, trying them out with the help of the documentation would read right at the beginning
Quote:
You’ll notice, though, that this only matches the exact string "Hello, World!", not e.g. "Hello, wOrld!" or "hello, world!" or any other such variation.
And further down:
Quote:
Knew you’d ask. Some useful sets are [0-9] matching a single number, [a-z] matching a single lowercase letter, [A-Z] matching a single uppercase letter, [a-zA-Z] matching a single letter and [a-zA-Z0-9] matching a single letter or number.
This is written to show that case matters. Otherwise in the above [a-z] would be sufficient to match a-z and A-Z.

Further on still we come to the part that shows how to flag ignore case
Quote:
In the beginning, you said there was a way to make a regular expression case insensitive?
Yes, I did, thanks for paying attention and reminding me. You can tell calibre how you want certain things handled by using something called flags. You include flags in your expression by using the special construct (?flags go here) where, obviously, you’d replace “flags go here” with the specific flags you want. For ignoring case, the flag is i, thus you include (?i) in your expression. Thus, (?i)test would match “Test”, “tEst”, “TEst” and any case variation you could think of.
Again, written from the perspective that case sensitivity is the norm.

So I still feel that adding the reverse syntax would be nice.

Mentioning the check-box and its effect at this point could also do users — new and experienced — a favour. The check-box only gets referenced once, right at the end under "Bulk editing metadata" which I skipped as I wasn't bulk editing metadata.

It's not as if I were suggesting a radical change in the application itself; just that a few words be added to documentation.
calmeilles is offline   Reply With Quote
Advert
Old 11-25-2017, 09:59 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
In the several years that this feature has existed, yours is the first complaint I've heard about that checkbox. SO I dont think it is quite an un-intuitive as you suggest. However, adding more documentation is always good.

So you are welcome to suggest improvements to the documentation, it is maintained s a simple plain text file, here:

https://github.com/kovidgoyal/calibr...ual/regexp.rst
kovidgoyal is online now   Reply With Quote
Old 12-07-2017, 04:33 AM   #7
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by calmeilles View Post
Er... I'm downright dumbfounded that that check box should affect a REGEX search. It's utility for 'Normal' searches is obvious, but it never even crossed my mind that it would affect a REGEX... that's just... silly!
It's the same as the "/.../i" option in most regex engines. Or grep's -i option. It saves having to write "[Ff][Ii][Nn][Dd] [Tt][Hh][Ii][Ss]" to find "find this" in a case-insensitive way.
Phssthpok is offline   Reply With Quote
Reply

Tags
locale, regex, search


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex in search problems (NOT Search&Replace; the search bar) lairdb Calibre 3 03-15-2017 07:10 PM
Regex Search doesn't search all files in Edit Book GregTheGrate Editor 8 11-08-2016 12:47 AM
A small suggestion for the settings backup kaufman Calibre Companion 4 09-02-2016 04:04 PM
Small improvement suggestion elibrarian Sigil 4 03-04-2015 06:12 PM
"Setting up a calibre development environment" documentation suggestion trying Development 1 03-30-2014 10:25 PM


All times are GMT -4. The time now is 02:58 AM.


MobileRead.com is a privately owned, operated and funded community.