Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 12-06-2024, 03:03 AM   #1
paperback
Connoisseur
paperback began at the beginning.
 
Posts: 76
Karma: 10
Join Date: Feb 2022
Device: None
Problem finding words with diacriticals in them

One issue with regex is that it does not appear to be able to ignore diacriticals. IME, this is often necessary because diacritical marks may have been incorrectly applied and need to be replaced.

For example. suppose you want to find the word "protege" (i.e. protégé). It will not return any results if there are diacriticals in it, unless you enter the letters with the exact same diacriticals..

Calibre needs to have a setting allowing it to read letters with diacriticals as ordinary letters. This will allow the user to find all instances of the word even if the diacriticals have been incorrectly applied.
paperback is offline   Reply With Quote
Old 12-06-2024, 03:15 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Not something I am interested in implementing, too much work to try to get regex engines to act like that. It's a reasonable amount of work to have plain text searches do that, but not something I care enough about to implement.
kovidgoyal is online now   Reply With Quote
Advert
Old 12-06-2024, 03:20 AM   #3
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,611
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
There are a few ways around that...

1. Search for é to find all instances where it is used,
2. From Tools>Reports>Characters, you can see a listing of all diacritics used and click on them to jump to them
3. Use the Spellcheck which does exactly what you ask...
Attached Thumbnails
Click image for larger version

Name:	spellchecker.jpg
Views:	127
Size:	65.8 KB
ID:	212369  
Karellen is offline   Reply With Quote
Old 12-16-2024, 09:53 AM   #4
Biblos
Junior Member
Biblos began at the beginning.
 
Biblos's Avatar
 
Posts: 2
Karma: 10
Join Date: Apr 2024
Device: Paperwhite
It's not exactly an answer to your question, but a palliative: why not search : prot?g?

More precise: prot[eéèê]g[eéèê]

Last edited by Biblos; 12-16-2024 at 09:57 AM.
Biblos is offline   Reply With Quote
Old 12-17-2024, 06:27 PM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by paperback View Post
For example. suppose you want to find the word "protege" (i.e. protégé). It will not return any results if there are diacriticals in it, unless you enter the letters with the exact same diacriticals.d all instances of the word even if the diacriticals have been incorrectly applied.
Calibre is using the regex library instead of the default re library. And regex has limited support for fuzzy searches.
For example: (these){e<=1} will find these, those, there (and many other strings).
If you search for (protege){e<=2} in Regex mode, it'll find protege , protegé and protégé.
It might of course also find other similar strings...
Doitsu is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem finding font adjustments Strether PocketBook 24 05-16-2024 01:10 AM
Finding Series of Capitalized Words? enuddleyarbl ePub 5 02-12-2023 12:18 PM
First post: Problem between Words and Sigil Ti-Ron Sigil 12 03-28-2012 04:11 PM
Highlighting, Defining Words on K3 - Problem with detacht69 Amazon Kindle 2 09-29-2010 10:36 PM
Having problem finding the eBook I want to buy. Jary316 Deals and Resources (No Self-Promotion or Affiliate Links) 5 08-25-2008 01:12 PM


All times are GMT -4. The time now is 09:51 PM.


MobileRead.com is a privately owned, operated and funded community.