View Single Post
Old 07-12-2012, 06:00 PM   #1
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,183
Karma: 317474
Join Date: Jun 2012
Device: kobo touch
building custom dictionaries, especially Japanese-English

Maybe Kobo will deliver a Japanese to English dictionary in the near future or maybe not. Since there is already a J>J dictionary on the KT, an E>J dictionary is certainly more important for the Japanese market than a J>E dictionary. This is why I have some doubts whether we will see an J>E dictionary soon. Therefore, I am thinking of building my own. I would like to share my ideas here in order to get some input and help from other people.

As I understand it, these are the important points of the dictionaries:
- Each dictionary consists of one folder.
- In this folder, there are a lot of html-files (gz-compressed). These html-files contain the dictionary entries in a simple tagged format. They can easily be edited.
- In the same folder, there is also a file named "words". The format of this file is unknown to me. As for its content, I am pretty sure that it consists of a list of all words that are to be found in the dictionary together with a pointer to the respective html-files.

As long as the format of the "words"-file is not known, it is impossible to build a dictionary from scratch. It is, however, possible to manipulate an already existing dictionary. Up to now, I have tried it with the dictionaries of the desktop application and it worked fine.

Now my plan for the J>E dictionary:
1) writing a script that goes through all entries of the J>J dictionary and checks whether there is a corresponding entry in Jim Breen's EDICT (an open source dictionary). If yes, the script inserts the EDICT's definition at the end of the J>J entry.
2) replacing the Japanese dictionary folder of the KT with the new one, maybe by manipulating the upgrade file and manually updating.

By the way, the path to the dictionaries in the "kobo3-update-2.0.0.zip" is "KoboRoot\usr\local\Kobo".

I welcome any comments, suggestions, advice, help.
tshering is offline   Reply With Quote