11-01-2012, 12:44 PM | #16 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
On my home computer, I have the same situation as you. There are the old dictionary folders, with the gzipped html files, and the new zip-files, containing encrypted html-files (I did not check all, but only the E-E dictionary, dicthtml.zip).
I did the last synchronization of my reader via my office computer, which I cannot access in the moment. On my Kobo, the dicthtml.zip contains gzipped html-files. Interesting point is: The encrypted html-files in the dicthtml.zip of the desktop application (home computer) are dated 07.08.2012. The gzipped html-files of the dicthtml.zip of the KT are dated 13.10.2011 !!!!. I just checked: in 1.9.17 they are dated 16.03.2012 in 2.0.0 they are dated 09.07.2012 |
11-01-2012, 01:36 PM | #17 | |
Zealot
Posts: 105
Karma: 37668
Join Date: Feb 2012
Device: Kobo Touch
|
Quote:
It seems the new firmware version (which omits the dictionaries) does not get the new dictionaries automatically but creates them from the dictionaries already installed on the device. (Or get it from the recovery partition for some reason?) At one point I removed all dictionaries from the reader via "Manage Dictionaries", and re-added them, so all my dictionaries are "brand new", encrypted. Not closely related, I have tried to add my eng->hun dictionary to the device as an addition not a replacement, but did not succeeded. Adding a new line to the Dictionary table in the kobo.sqlite database resulted showing up the eng-hun dictionary in the "manage Dictionaries" but not as an actual choice at the dictionary selection for a word to translate/define :-(. |
|
Advert | |
|
11-01-2012, 03:25 PM | #18 | |||
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Quote:
Quote:
Quote:
Maybe also unrelated. When I went from 2.1.1 to 2.1.4 the J-J dictionary was definitly there. When I pointed at a Japanese word the Japanese dictionary entry popped up, but it did not appear as a choice at the dictionary selection for a word to define. As far as I remember, I modified the relevant information in the database, but the situation did not change. Only after checking it at the "manage dictionaries" screen and several synchronizations, was the J-J given as a choice. |
|||
11-02-2012, 11:24 AM | #19 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I don't know what I did this time different from last time, but finally I got the J>E dictionary working. Thanks to mnjkl and clsdclsd for support.
Did anybody already have a look at MARISA (cf. post)? Would nice if we could add new dictionary entries. |
11-02-2012, 06:22 PM | #20 | |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Quote:
You can enumerate the current keys using the executable named marisa-reverse-lookup, and requiring ID 0, 1, 2, ... (there is no better way, i.e., there is no method to dump the entire set of keys in the dictionary at once) As far as I understand, there is no way of augmenting an existing dictionary with new keys. You have to store the previous set of keys, append the new keys, and create a new dictionary from the latter "augmented" set. |
|
Advert | |
|
11-02-2012, 08:25 PM | #21 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
@AlPe
Thank you very much for the information. Right now, I am not sure whether to invest more time in the Japanese dictionary. As it is now, selecting text in a Japanese book is so cumbersome that using the dictionary is rather a pain. One can only hope that this improves with a future update. I was hoping somebody else would go this way, so that I could easily follow his steps. Anyway, if I were to create a new dictionary (and at the moment, I don't have the necessary knowledge to do it) I would possibly also wish to replace the content (I mean the html files) completely. Thank you again. |
11-02-2012, 09:26 PM | #22 |
No Comment
Posts: 3,238
Karma: 23878043
Join Date: Jan 2012
Location: Australia
Device: Kobo: Not just an eReader, it's an adventure!
|
I've posted the direct links to the dictionaries in the Direct Links to Kobo Firmware thread.
|
11-03-2012, 08:55 PM | #23 | |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I would like to say thanks to murg for maintaining the link list. This is really helpful.
Today I installed MinGW in order to compile MARISA 0.2.0 under Windows and try my hands at her tools. Would have been nice to fall in love with her. She didn't compile first. I found a related bug report and applied the proposed solution (link). After that she compiled. I wrote some lines of random text into a file "keyset.txt" and run marisa-benchmark and marisa-build against it. Both seemed to do their job, whatever their job exactly might be. Then I run all other tools, marisa-lookup and so on, against the dictionary "keyset.dic", which was produced by marisa-build. All of them reported the same error: Quote:
Last edited by tshering; 11-03-2012 at 08:58 PM. |
|
11-04-2012, 11:00 AM | #24 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
As I reported in my last post on this threat, I was able to build a marisa dictionary but was unable to retrieve anything from a dictionary. "Dictionary" means here a highly compressed list of key-value pairs. This might not pass as a real definition, but might be good enough for our purposes. This kind of dictionary I will call here key-dictionary.
In the Kobo dictionaries (in order to prevent confusion I will call them language-dictionaries ) the key-dictionaries have the name "words". This "words" file is used to get the information whether an expression that is looked-up can be found in the respective language dictionary or not, and maybe some other information. If we knew what the values of the key-value pairs consist of we could build our own "words" file. This again would enable us to insert new entries into the language dictionaries, or to build up a new dictionary from scratch. How the values look like should be easily ascertained with the marisa tools. However, I failed in my attempts. Therefore, I can only speculate about it. 1) In order to find out whether a certain word is in the language-dictionary it should be enough that the respective key is found in the key-dictionary. So we don't need any specific value. 2) In which html file is the looked-up expression located? Generally, it is located in a html file named after the first two letters of the expression. The word "body", for instance, is in the bo.html. In this case no further information is needed. No need for any specific value. 3) How are plural words, different verb forms, and so on handled? They are listed as variants under the main heading. We find for instance "bodies" listed as variant of "body" in bo.html. Still no need for any specific value. 3a) But what if the variant differs in the first two letters? We find for instance "went" as a variant of "go" in html.go. This could ask for a specific value. On could think of key="went" and value="go". This information would be sufficient to point the search engine to go.html. Is it done this way? Let us open the English dictionary screen of the KT and select it from the list. Surprise! It does not show the entry for "go", "went" has its own dictionary entry in we.html. Therefore, still no need for a specific value. Two bytes spared. In English, there are maybe not many variants of words that differ in the first two letters, and so this handling might pay off. But how is this in other languages, for instance German with its ablaut derivations? In ha.html of the German dictionary, we find, for instance, "hieb", "hiebest", "hiebet", "hiebe", "hiebst", gehauen", "hieben" as variants of "hauen". Are the all treated as individual entries? Let us open the German dictionary screen and type "hieb" and select any of the listed words. The first word, "hieb" gets us to the wrong entry "Hieb," in all other cases we read "No dictionary entry found for..." Evidently, the search engine searches in hi.html, whereas it should search in ha.html. From these observations it seems to me likely that - at least in some of the language dictionaries - all values in the key-dictionary are empty or irrelevant. Last edited by tshering; 11-06-2012 at 08:40 AM. Reason: Some corrections in: "In ha.html of the German dictionary,..."; replaced "from scrap" by "from scratch" |
11-04-2012, 12:56 PM | #25 | ||
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Quote:
For 1), usually one assigns an ID to each chunk, like this: 11.html is 0, aa.html is 1, etc. in lexicographical order. For 2), an easy way is to store, for word W, the offset, in bytes from the beginning of the chunk, where the definition of W starts. (The dictionary is slit into several chunks to allow faster fetch-decompress-find operations) See my analysis of the Cybook Odyssey dictionaries at: http://www.albertopettarin.it/penelope.html Quote:
|
||
11-04-2012, 03:47 PM | #26 | |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
@AlPe
Thank you very much for your comments. I enjoyed much reading your article "Dictionaries for Bookeen Cybook Odyssey". Maybe I will study your script in order to start learning python. Quote:
Therefore, my guess is that the position of a dictionary entry in the .html is determined by a simple text search for name="W". In that way both cases are coverd, the main head entry (<a name="go">), and the variant (<variant name="goes"/>). From the behaviour of the Japanese dictionary I got the impression that there things are handled a little different. I still have to think it through. Most important of course is to get the marisa tools working. |
|
11-04-2012, 03:55 PM | #27 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Ops, I missed what you previously did.
Your explanation makes sense: after loading the right chunk, they perform a search to locate the beginning of the definition. |
11-05-2012, 08:31 AM | #28 |
Member
Posts: 11
Karma: 4264
Join Date: Dec 2011
Device: kobo touch
|
It seems new firmware put dictionart into .kobo\dict.
|
11-05-2012, 11:50 AM | #29 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
From what I see, the file "words" contains only the words stored in the dictionary (the "keys"), in several variants --- e.g., singular/plural for the Italian one.
Hence, file "words" can be used only to know whether a query word is present in the dictionary or not. I think that the kobo software checks whether a word is present, then it matches the word with the chunk, and then it performs a full text search in the chunck to locate the beginning of the definition for the query word. (quite inefficient process, in my opinion) |
11-05-2012, 01:03 PM | #30 | ||
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Quote:
Quote:
As for the Japanese dictionary, there seem to be more steps involved for attesting whether a word is present in the dictionary. In Japanese, there are (at least) two ways of writing a word, in Kanji (logographic characters) and in Kana (phonological characters). In the flle "words", both kinds of writing an expression are put one after the other (Kana[Kanji]). As I understand marisa, it can find strings that match the search string exactly and strings that start with the search string. So in order to search for a Kanji in "words" it would be necessary to pair the Kanji with the Kana reading first. Last edited by tshering; 11-06-2012 at 08:44 AM. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
What's file format of dictionary | mnjkl | Kobo Reader | 2 | 12-12-2011 08:48 AM |
Dictionary format | jgray | Sony Reader | 1 | 10-25-2010 09:52 AM |
English Thesaurus in the dictionary format | osnova | Amazon Kindle | 14 | 12-12-2009 06:42 PM |
Dictionary: what version? can it be in firmware? | jedix | Sony Reader Dev Corner | 7 | 12-05-2008 12:00 PM |
Webster dictionary in DEPReader format | abigail | Reading and Management | 0 | 08-10-2005 08:00 AM |