View Single Post
Old 11-29-2015, 04:24 PM   #40
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
New Comparison Transform Function in Version 1.0.20

Quote:
Originally Posted by Gary_M_Mugford View Post
Dalton,

Due to my predilection for Scandanavian mysteries, I find myself with ONE teensy request more, and that's for comparisons that ignore the extended character set diacriticals. Not sure whether to consolidate WITH the extended character attributes or whether just force everything back down to regular ascii.

Thanks, GM
@GM:

See the attached example from Version 1.0.20.

One word of caution: The greater that a particular metadata language's alphabet drifts from the Roman alphabet, the less accurate the new 'Compare as: Decomposed & Normalized Alphabet' Transform Function will become. Western European languages should (of course) work accurately, but Chinese, Japanese, Korean, Thai, and so forth will be (at best) much less accurate. The only Eastern European language I tested was Polish, and ĶźŽ was viewed as equal to KzZ for the purposes of searching, so it is likely that most of the Slavik languages will work well.

Obviously, if all of the metadata is properly spelled in a particular language, then of course the search will work perfectly. The issues arise when they do not.

For example, assume that the original title in Polish contained "ĶźŽŦ", but the translated title contained "KzZF". That would fail a check for equality, because the letter "Ŧ" does not transliterate to an "F". MCS would say they are different for that reason alone.


DaltonST
Attached Thumbnails
Click image for larger version

Name:	mcs_decomposed_and_normalized_alphabet_example.jpg
Views:	704
Size:	912.9 KB
ID:	144173  
DaltonST is offline   Reply With Quote