Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 07-29-2014, 04:04 PM   #1
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
SQLite: Case-insensitive matching of Unicode characters

According to SQLite Frequently Asked Question #18 at http://sqlite.org/faq.html#q18 :

(18) Case-insensitive matching of Unicode characters does not work.
The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability. Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads.

So, COLLATE NOCASE in a SQLite table definition or in a SELECT is only good for pure ASCII comparisons. Unless, of course, what is described above has been implemented.

Does anyone know if Calibre's SQLite has already been implemented with Unicode UTF-8 case insensitive matching as described above?

For example, Calibre would need this capability when searching for Tags in Unicode UTF-8 that have very non-ASCII characters, such as in the German word sachbüch, the Hindi word NAHĪMṀ, the Spanish word noficción, the Turkish word gerçek, and so forth. Ditto for Authors, Title, and Series.

Thanks in advance.

Last edited by DaltonST; 07-29-2014 at 08:51 PM.
DaltonST is offline   Reply With Quote
Old 07-29-2014, 11:11 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,594
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre only uses sqlite as a disk storage format, not a database. All sorting/searching is performed using ICU on an in memory normalized view of the data from the database.
kovidgoyal is offline   Reply With Quote
Advert
Old 09-15-2014, 03:48 AM   #3
rApeNB
Junior Member
rApeNB began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2014
Device: Kindle Touch
When I update the cc.db, error prone.
no such collation sequence: icu
How can I get rid of it?
rApeNB is offline   Reply With Quote
Reply

Tags
lowercase, nocase, sqlite, unicode, utf-8


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre in mixed case sensitive/insensitive environments Xwang Development 6 08-16-2022 03:06 PM
find unicode characters Sunlite Editor 12 01-05-2014 07:04 AM
¿Convert unicode decomposed characters to unique/normal characters? JohnQwerty Calibre 3 04-05-2012 12:08 PM
Search filters: accented characters not matching plain ones riki Calibre 4 11-26-2011 07:38 AM
Small bug? Case-insensitive tags. Arrghus Calibre 9 07-12-2011 01:03 AM


All times are GMT -4. The time now is 07:57 PM.


MobileRead.com is a privately owned, operated and funded community.