View Single Post
Old 07-29-2014, 04:04 PM   #1
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
SQLite: Case-insensitive matching of Unicode characters

According to SQLite Frequently Asked Question #18 at http://sqlite.org/faq.html#q18 :

(18) Case-insensitive matching of Unicode characters does not work.
The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability. Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads.

So, COLLATE NOCASE in a SQLite table definition or in a SELECT is only good for pure ASCII comparisons. Unless, of course, what is described above has been implemented.

Does anyone know if Calibre's SQLite has already been implemented with Unicode UTF-8 case insensitive matching as described above?

For example, Calibre would need this capability when searching for Tags in Unicode UTF-8 that have very non-ASCII characters, such as in the German word sachbüch, the Hindi word NAHĪMṀ, the Spanish word noficción, the Turkish word gerçek, and so forth. Ditto for Authors, Title, and Series.

Thanks in advance.

Last edited by DaltonST; 07-29-2014 at 08:51 PM.
DaltonST is offline   Reply With Quote