Quote:
Originally Posted by sadowski
The "fuzzy" search algorithm that stardict implements can give inadequate results sometimes....
There seem to be 2 problems with this look-up algorithm:
1. If there is no exact match, stardict falls back to a fuzzy search,...
2. In most Europen languages, words roots can be found by manipulating endings, e.g., typically --> typical. This is language specific but makes dictionaries much more efficient. Anyone else stumbled over this? Any suggestions?
|
On the koreader/sdcv page Chrox refers for info
to a man page. In the description it states:
Quote:
sdcv is simple, cross-platform text-base utility for work with dictionaries in StarDict's format. The word from "list of words" may be string with leading '/' for using Fuzzy search algorithm, with leading '|' for using full-text search, string may contain '?' and '*' for using regexp search....
|
Sdcv is called
in line 67 in readerdictionary.lua:
Code:
local std_out = io.popen(
"./sdcv --utf8-input --utf8-output -nj "
..
("%q"):format(word)
, "r")
From the man page the only switch I don't understand is "-nj". The "n" would be for non-interactive, but the "j" eludes me.
More importantly there is no leading "/" for Fuzzy search algorithm, nor leading "|" for using full-text search. So sdcv
doesn't run in fuzzy logic mode for koreader.
This does however give you some room for experimenting with sdcv. You could simply alter line 67 and add at the end of the first part of the string a "|":
Code:
"./sdcv --utf8-input --utf8-output -nj |"
I would be interested whether those changes have an impact on your problem.
As sdcv is apparently chosen because of simplicity a more complex approach to dictionary lookup would either mean that you'll have to write code to preformat the "word" or to implement an alternative command line utility with more features. The emphasis on command line use is that it can be called from within the lua-code with a one-line command! Anything more complex would require extra code to wrap around the utility and thus more time for execution.