07-27-2013, 05:01 PM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: May 2012
Device: kindle
|
Colordict with Stardict - morphology
Hi,
I am trying to get ColorDict to work on my (rooted) Nook, because this is the only dictionary that integrates with PageTurner reader. I have a couple of Stardict format dictionaries that work just fine with morphology on my desktop machine. For example, I search for "worked", and it returns an entry for "work". They also work with Fora dictionary on this Nook. However, they don't work on Colordict, and I don't understand why. I tried to install those same dictionaries into Colordict. I copied them over to dictdata on the SD card. Colordict sees them, and indexes them. Yet it does not find inflected entries. So if I enter "work", I get a definition. If I enter "worked", it tells me that it is not found. Does colordict not support morphology? Or how do I get it to work? Thanks, Myrosia |
08-18-2013, 06:33 PM | #2 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
The StarDict format does not support morphology. I suspect, that your desktop software has some kind of "fuzzy word" functionality, that looks for entries that resemble the search term - which could mean it just looks for a match in as many letters as possible. ColorDict doesn't have this feature and although it's a nice tool, it seems to me like it hasn't seen any updates for a long time.
Did you try GoldenDict instead? I personally don't use it myself because of minor flaws that haven't been resolved, even though they have been well known for over a year now. (I don't know whether it has this fuzzy word feature and whether it will integrate with PageTurner, it's just a guess.) |
08-18-2013, 06:48 PM | #3 |
Junior Member
Posts: 6
Karma: 10
Join Date: May 2012
Device: kindle
|
Unfortunately, GoldenDict does not integrate with PageTurner. And yes, it does support morphology, and works fine with my Stardict format files. My desktop software is actually the original stardict, and I am fairly sure it supports morphology properly, because the searches are quite accurate, and never have errors that I'd expect with fuzzy word searches (and also find irregular verbs, etc.).
|
08-19-2013, 09:33 AM | #4 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
In the StarDict format specification there is no such thing like morphology. Another guess about what could lead to your impression: Perhaps the dictionary you are using uses the synonyms function very extensively in order to replace proper morphology features. For english this might work well, for more complex languages like French, German or even Latin it will certainly end up either in an extremely huge synonyms file or in very inaccurate results.
To explain, what a "real" morphology feature would look like: This would mean, you input e.g. the french word "était" and the dictionary's output is "third person singular imparfait indicative active of <être>", which somehow links you to the entry of "être". The StarDict format can not handle this, but of course there are ways to simulate it, but I don't know of any StarDict dictionary that actually achieves that and I dare say, it would lead to an incredibly huge dictionary (millions of entries for modestly complex languages). Last edited by tuxor; 08-19-2013 at 09:37 AM. |
08-19-2013, 10:01 AM | #5 | ||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
However, the dictionary client can only use them if they were included in the source file, and most StarDict dictionaries simply don't contain them, which might lead users to the erroneous assumption that StarDict can't handle inflections. BTW, inflections can be defined using the Babylon glossary format (GLS): [blank line] Term | Alternate1 | Alternate2| ... | AlternateK [attributes] Definition [blank line] Quote:
|
||
08-19-2013, 10:26 AM | #6 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
I think you are mixing up dictionary formats somehow. Here is the StarDict format specification: http://code.google.com/p/babiloo/wiki/StarDict_format - it doesn't include anything like morphology or inflections. And of course there is no "linguistic compression technique" as part of the StarDict format - the contents are stored as simple gzip data or even completely uncompressed.
Using the StarDict format, it is indeed possible storing the dictionary entries in an arbitrary markup language - but officially supported are only HTML, MediaWiki, Pango and XDXF. Only the latter can handle morphology and inflections. Looking up inflected word forms is definitely not part of the StarDict format specification and there is no canonical implementation of such a feature based on the StarDict format. You definitely mix up StarDict with some competing dictionary format like Abbyy Lingvo, MobiPocket or Babylon BGL, all of which have certain morphology features and might include "linguistic compression techniques" - since those formats are not officially documented, I can't tell. // EDIT By the way: Maybe it's not even desirable having the morphology information as part of the dictionary. It seems slightly more feasible including the morphology support to the dictionary software - the software would then use a tool like hunspell and existing hunspell dictionaries to recognize inflected forms. A real world example of "proper" morphology support: http://www.perseus.tufts.edu/hopper/...eek&prior=a%29 Last edited by tuxor; 08-19-2013 at 11:20 AM. |
08-19-2013, 11:54 AM | #7 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Since you're probably still fully convinced that I'm wrong, please find attached a small English-French proof-of-concept dictionary with four entries and inflections. It'll find the singular and plural forms of mouse and louse and the conjugated verb forms of bring and catch. The .babylon source file is included. No I don't, since I've successfully created several StarDict dictionaries with inflections myself. BTW, StarDict was designed in part as a Babylon clone and supports many of its features, including inflections. However, you're correct in that it doesn't employ linguistic compression, which is used by Kindle dictionaries. |
|
08-19-2013, 12:05 PM | #8 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Purely as a matter of interest, what is "linguistic compression"?
|
08-19-2013, 12:06 PM | #9 | |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
Quote:
I already talked about "simulating" morphology support using the synonyms feature of StarDict. And it is definitely insane simulating morphology support in this way whenever the language is a bit more complex - for Hungarian, Ancient Greek or even German the synonyms file would become incredibly large and you wouldn't even end up with "proper" morphology support. Have a look at http://www.manpagez.com/man/4/hunspell/ for an example of how support for morphology could technically look like. For a reasonably complex language like French you only need about 1.4 MB of (uncompressed!) data in order to support basic morphology features in hunspell. With the StarDict synonym file you would definitely need more... // EDIT Btw: If the OP had talked about simple morphology simulation using synonyms, he wouldn't have reported that this is not working with ColorDict because ColorDict has perfect support for synonym files! Last edited by tuxor; 08-19-2013 at 12:14 PM. |
|
08-19-2013, 01:50 PM | #10 | |||||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
According to the Mobipocket website, Mobipocket/Kindle dictionaries use the Levenshtein distance algorithm to keep the file size down. For more information see this Mobipocket website article. Quote:
Quote:
Quote:
Quote:
You really may want to do some actual tests instead of purely relying on third hand information! |
|||||
08-19-2013, 02:51 PM | #11 | ||||
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
Quote:
Quote:
Quote:
Quote:
Let me conclude: I'm well aware of your definition of "StarDict supports morphology" now and I accept that definition, in order to get some progress in this thread. We can now proceed with the OPs question. Does ColorDict support morpholgy? Because I'm using ColorDict on a daily basis using dictionaries that do make heavy use of the ".syn" file of the StarDict format, I have to say: Yes, ColorDict supports morphology. The OP reports that ColorDict does not show the entry for "work", when he looks up "worked". Hence we conclude, that his dictionary does not have the entry "worked" in its ".syn" file. The only thing we have to clarify now: How come, that his desktop application does indeed show the entry "work" when he is looking up "worked", even though "worked" is not in the ".syn" file of the dictionary? Last edited by tuxor; 08-19-2013 at 02:58 PM. |
||||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can put colordict dictionaries on SD card? | pietro | Sony Reader Dev Corner | 0 | 08-27-2012 03:46 AM |
colordict or fora for dictionary? | shootist | Android Devices | 2 | 03-06-2012 11:03 AM |
How do you add dictionaries to colordict? | shootist | Android Devices | 1 | 03-02-2012 12:19 AM |
colordict translation dictionaries question | sovre | Android Devices | 1 | 02-26-2012 05:35 PM |
ezPDF Reader + ColorDict integration | Parsa | Android Devices | 6 | 02-01-2012 10:13 PM |