View Single Post
Old 11-16-2012, 01:00 PM   #1
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,274
Karma: 332560
Join Date: Jun 2012
Device: kobo touch
A suggestion to the Kobo developers

Edit: This post is outdated, cf. this post.
I tested the Japanese dictionary of the KT for some time. I am sorry to say but what I found out is that the dictionary function is of almost no use. I am not saying that the dictionary itself is too small or of minor quality. This might be the case or not. The real problem, however, is that the search engine is unable to retrieve the proper data from the dictionary with any certainty. The reason for this is easy to see. First thing the search engine does is retrieve from a database one (of usually several possible) phonetic representations (=Kana) of the searched for Kanji(s). The fact that in many cases there are several possible phonetic representations makes the selection of one representation without further effective checking rather arbitrary. This is the first weak point.
In the next step, the phonetic representation together with the Kanji(s) is looked up in an index file ("words") in order to ascertain whether the dictionary has an entry that corresponds to the searched for word or expression. If the result is that there is a corresponding entry in the dictionary the search engine turns to a certain html file in order to retrieve the dictionary entry. The name of the html file is determined by the first two letters of the phonetic representation. Whether this is the appropriate html file depends therefore on whether the Kanji-Kana matching fitted (per chance) the context. Even if the appropriate html file is accessed there is no guarantee that the search engine will come up with the correct result, because at this stage, the engine has already "forgotten" which kanji the user was looking for. Therefore it happily presents the first entry that matches the phonetic representation as the result.
The fact that the Japanese language has an extremely large number of homophones speaks clearly against the chosen approach.
My suggestion would therefore be to take the Kanjis (or Kanji+Kana expressions as might be the case) as the primary means of organizing the material. The only necessary modification to the general procedure (followed by the KT with the other languages) would be that the name of the html files would consist in only one letter (= Kanji). This would result in approximately 4749 html files in the case of a decent Japanese Dictionary (The present Japanese dictionary consist of ca. 4100 files). My calculation is based on a version of the free edict dictionary file that contains 205 721 entries (cf. http://www.csse.monash.edu.au/~jwb/edict.html). In order to take care of those words that are commonly written in Kana, rather than in Kanji, and also words starting with numeric characters some further html files would be needed (232 in the case of my edict version).
I think the improvement would be tremendous and the efforts worthwhile. I would be happy to answer any questions you may have.

Last edited by tshering; 12-20-2012 at 02:52 PM.
tshering is offline   Reply With Quote