View Single Post
Old 01-24-2015, 08:05 AM   #695
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Expanding the functionality of dictionary lookup

Quote:
Originally Posted by sadowski View Post
The "fuzzy" search algorithm that stardict implements can give inadequate results sometimes....

There seem to be 2 problems with this look-up algorithm:
1. If there is no exact match, stardict falls back to a fuzzy search,...

2. In most Europen languages, words roots can be found by manipulating endings, e.g., typically --> typical. This is language specific but makes dictionaries much more efficient.
Anyone else stumbled over this? Any suggestions?

On the koreader/sdcv page
Chrox refers for info to a man page. In the description it states:
Quote:
sdcv is simple, cross-platform text-base utility for work with dictionaries in StarDict's format. The word from "list of words" may be string with leading '/' for using Fuzzy search algorithm, with leading '|' for using full-text search, string may contain '?' and '*' for using regexp search....
Sdcv is called in line 67 in readerdictionary.lua:
Code:
 local std_out = io.popen(
    "./sdcv --utf8-input --utf8-output -nj "
    ..
    ("%q"):format(word)
 , "r")
From the man page the only switch I don't understand is "-nj". The "n" would be for non-interactive, but the "j" eludes me.

More importantly there is no leading "/" for Fuzzy search algorithm, nor leading "|" for using full-text search. So sdcv doesn't run in fuzzy logic mode for koreader.

This does however give you some room for experimenting with sdcv. You could simply alter line 67 and add at the end of the first part of the string a "|":
Code:
 "./sdcv --utf8-input --utf8-output -nj |"
I would be interested whether those changes have an impact on your problem.

As sdcv is apparently chosen because of simplicity a more complex approach to dictionary lookup would either mean that you'll have to write code to preformat the "word" or to implement an alternative command line utility with more features. The emphasis on command line use is that it can be called from within the lua-code with a one-line command! Anything more complex would require extra code to wrap around the utility and thus more time for execution.
Markismus is offline   Reply With Quote