MobileRead Forums - View Single Post - KOReader: a document reader for PDF, DJVU, EPUB, FB2, HTML, ... (GPLv3)

sadowski · 01-23-2015, 05:41 AM

The "fuzzy" search algorithm that stardict implements can give inadequate results sometimes like in this example:

"handen" (Swedish "the hand", article -en is appended to "hand")

returns 9 not even close hits in this order (from a Swedish dictionary):

anden, banden, bandens, handel, handeln, handels, hinden, hindens, hunden, tanden

but not the correct match "hand" which is indeed contained in the dictionary.

There seem to be 2 problems with this look-up algorithm:

1. If there is no exact match, stardict falls back to a fuzzy search, allowing character replacements/insertions/deletions everywhere in the word. It would be more adequate to return a list of words starting with the query.

2. In most Europen languages, words roots can be found by manipulating endings, e.g., typically --> typical. This is language specific but makes dictionaries much more efficient.

Anyone else stumbled over this? Any suggestions?

Jens

01-23-2015, 05:41 AM	#689
sadowski Connoisseur Posts: 84 Karma: 1142796 Join Date: Jul 2009 Device: Sony PRS 350, Kobo mini, PB mini	Dictionary: fuzzy stardict search The "fuzzy" search algorithm that stardict implements can give inadequate results sometimes like in this example: "handen" (Swedish "the hand", article -en is appended to "hand") returns 9 not even close hits in this order (from a Swedish dictionary): anden, banden, bandens, handel, handeln, handels, hinden, hindens, hunden, tanden but not the correct match "hand" which is indeed contained in the dictionary. There seem to be 2 problems with this look-up algorithm: 1. If there is no exact match, stardict falls back to a fuzzy search, allowing character replacements/insertions/deletions everywhere in the word. It would be more adequate to return a list of words starting with the query. 2. In most Europen languages, words roots can be found by manipulating endings, e.g., typically --> typical. This is language specific but makes dictionaries much more efficient. Anyone else stumbled over this? Any suggestions? Jens