Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Android Devices

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2013, 05:01 PM   #1
myrosia
Junior Member
myrosia began at the beginning.
 
Posts: 6
Karma: 10
Join Date: May 2012
Device: kindle
Colordict with Stardict - morphology

Hi,

I am trying to get ColorDict to work on my (rooted) Nook, because this is the only dictionary that integrates with PageTurner reader.

I have a couple of Stardict format dictionaries that work just fine with morphology on my desktop machine. For example, I search for "worked", and it returns an entry for "work". They also work with Fora dictionary on this Nook. However, they don't work on Colordict, and I don't understand why.

I tried to install those same dictionaries into Colordict. I copied them over to dictdata on the SD card. Colordict sees them, and indexes them. Yet it does not find inflected entries. So if I enter "work", I get a definition. If I enter "worked", it tells me that it is not found.

Does colordict not support morphology? Or how do I get it to work?

Thanks,

Myrosia
myrosia is offline   Reply With Quote
Old 08-18-2013, 06:33 PM   #2
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
The StarDict format does not support morphology. I suspect, that your desktop software has some kind of "fuzzy word" functionality, that looks for entries that resemble the search term - which could mean it just looks for a match in as many letters as possible. ColorDict doesn't have this feature and although it's a nice tool, it seems to me like it hasn't seen any updates for a long time.

Did you try GoldenDict instead? I personally don't use it myself because of minor flaws that haven't been resolved, even though they have been well known for over a year now. (I don't know whether it has this fuzzy word feature and whether it will integrate with PageTurner, it's just a guess.)
tuxor is offline   Reply With Quote
Old 08-18-2013, 06:48 PM   #3
myrosia
Junior Member
myrosia began at the beginning.
 
Posts: 6
Karma: 10
Join Date: May 2012
Device: kindle
Unfortunately, GoldenDict does not integrate with PageTurner. And yes, it does support morphology, and works fine with my Stardict format files. My desktop software is actually the original stardict, and I am fairly sure it supports morphology properly, because the searches are quite accurate, and never have errors that I'd expect with fuzzy word searches (and also find irregular verbs, etc.).
myrosia is offline   Reply With Quote
Old 08-19-2013, 09:33 AM   #4
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
In the StarDict format specification there is no such thing like morphology. Another guess about what could lead to your impression: Perhaps the dictionary you are using uses the synonyms function very extensively in order to replace proper morphology features. For english this might work well, for more complex languages like French, German or even Latin it will certainly end up either in an extremely huge synonyms file or in very inaccurate results.

To explain, what a "real" morphology feature would look like: This would mean, you input e.g. the french word "était" and the dictionary's output is "third person singular imparfait indicative active of <être>", which somehow links you to the entry of "être".

The StarDict format can not handle this, but of course there are ways to simulate it, but I don't know of any StarDict dictionary that actually achieves that and I dare say, it would lead to an incredibly huge dictionary (millions of entries for modestly complex languages).

Last edited by tuxor; 08-19-2013 at 09:37 AM.
tuxor is offline   Reply With Quote
Old 08-19-2013, 10:01 AM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by tuxor View Post
In the StarDict format specification there is no such thing like morphology.
That is incorrect. While StarDict doesn't feature a full-fledged parser, it does support inflections, which is quite sufficient for most users.
However, the dictionary client can only use them if they were included in the source file, and most StarDict dictionaries simply don't contain them, which might lead users to the erroneous assumption that StarDict can't handle inflections.

BTW, inflections can be defined using the Babylon glossary format (GLS):

[blank line]
Term | Alternate1 | Alternate2| ... | AlternateK
[attributes]
Definition
[blank line]


Quote:
Originally Posted by tuxor View Post
The StarDict format can not handle this, but of course there are ways to simulate it, but I don't know of any StarDict dictionary that actually achieves that and I dare say, it would lead to an incredibly huge dictionary (millions of entries for modestly complex languages).
You're again wrong. The StarDict compiler uses linguistic compression techniques to keep the overall dictionary size relatively small, even if the dictionary contains lots of inflections. It can easily handle languages such as French, German and Latin.
Doitsu is offline   Reply With Quote
Old 08-19-2013, 10:26 AM   #6
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
I think you are mixing up dictionary formats somehow. Here is the StarDict format specification: http://code.google.com/p/babiloo/wiki/StarDict_format - it doesn't include anything like morphology or inflections. And of course there is no "linguistic compression technique" as part of the StarDict format - the contents are stored as simple gzip data or even completely uncompressed.

Using the StarDict format, it is indeed possible storing the dictionary entries in an arbitrary markup language - but officially supported are only HTML, MediaWiki, Pango and XDXF. Only the latter can handle morphology and inflections. Looking up inflected word forms is definitely not part of the StarDict format specification and there is no canonical implementation of such a feature based on the StarDict format.

You definitely mix up StarDict with some competing dictionary format like Abbyy Lingvo, MobiPocket or Babylon BGL, all of which have certain morphology features and might include "linguistic compression techniques" - since those formats are not officially documented, I can't tell.


// EDIT

By the way: Maybe it's not even desirable having the morphology information as part of the dictionary. It seems slightly more feasible including the morphology support to the dictionary software - the software would then use a tool like hunspell and existing hunspell dictionaries to recognize inflected forms.

A real world example of "proper" morphology support: http://www.perseus.tufts.edu/hopper/...eek&prior=a%29

Last edited by tuxor; 08-19-2013 at 11:20 AM.
tuxor is offline   Reply With Quote
Old 08-19-2013, 11:54 AM   #7
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by tuxor View Post
Here is the StarDict format specification: http://code.google.com/p/babiloo/wiki/StarDict_format - it doesn't include anything like morphology or inflections.
The StarDict input formats are indeed poorly documented, but StarDict does support inflections.

Since you're probably still fully convinced that I'm wrong, please find attached a small English-French proof-of-concept dictionary with four entries and inflections.
It'll find the singular and plural forms of mouse and louse and the conjugated verb forms of bring and catch. The .babylon source file is included.

Quote:
Originally Posted by tuxor View Post
You definitely mix up StarDict with some competing dictionary format like Abbyy Lingvo, MobiPocket or Babylon BGL ....
No I don't, since I've successfully created several StarDict dictionaries with inflections myself. BTW, StarDict was designed in part as a Babylon clone and supports many of its features, including inflections. However, you're correct in that it doesn't employ linguistic compression, which is used by Kindle dictionaries.
Attached Files
File Type: zip fr_en.zip (1.3 KB, 316 views)
Doitsu is offline   Reply With Quote
Old 08-19-2013, 12:05 PM   #8
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Purely as a matter of interest, what is "linguistic compression"?
HarryT is offline   Reply With Quote
Old 08-19-2013, 12:06 PM   #9
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
Quote:
Originally Posted by Doitsu View Post
The StarDict input formats are indeed poorly documented, but StarDict does support inflections.

Since you're probably still fully convinced that I'm wrong, please find attached a small English-French proof-of-concept dictionary with four entries and inflections.
It'll find the singular and plural forms of mouse and louse and the conjugated verb forms of bring and catch. The .babylon source file is included.
This is not support for morphology and/or inflections. This is support for synonyms, as I already mentioned above. And also the files you attached are simply in the StarDict format just like it is (indeed very well) documented here: http://code.google.com/p/babiloo/wiki/StarDict_format

I already talked about "simulating" morphology support using the synonyms feature of StarDict. And it is definitely insane simulating morphology support in this way whenever the language is a bit more complex - for Hungarian, Ancient Greek or even German the synonyms file would become incredibly large and you wouldn't even end up with "proper" morphology support.

Have a look at http://www.manpagez.com/man/4/hunspell/ for an example of how support for morphology could technically look like. For a reasonably complex language like French you only need about 1.4 MB of (uncompressed!) data in order to support basic morphology features in hunspell. With the StarDict synonym file you would definitely need more...

// EDIT

Btw: If the OP had talked about simple morphology simulation using synonyms, he wouldn't have reported that this is not working with ColorDict because ColorDict has perfect support for synonym files!

Last edited by tuxor; 08-19-2013 at 12:14 PM.
tuxor is offline   Reply With Quote
Old 08-19-2013, 01:50 PM   #10
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by HarryT View Post
Purely as a matter of interest, what is "linguistic compression"?
Linguistic compression was originally developed to save disk space and RAM. For example, most file compression methods use dictionary-based algorithms.
According to the Mobipocket website, Mobipocket/Kindle dictionaries use the Levenshtein distance algorithm to keep the file size down. For more information see this Mobipocket website article.

Quote:
Originally Posted by tuxor View Post
This is not support for morphology and/or inflections. This is support for synonyms, as I already mentioned above.
You obviously don't understand the difference between inflections and synonyms. For example, "brought" is an inflection not a synonym of "bring" and any dictionary software that allows users to find a headword by searching for an inflected form does support inflections.

Quote:
Originally Posted by tuxor View Post
Btw: If the OP had talked about simple morphology simulation using synonyms, he wouldn't have reported that this is not working with ColorDict because ColorDict has perfect support for synonym files!
No s/he did not. If you re-read his/her post you'll find that s/he mentions that searching for "worked" didn't bring up the entry for "work" and "worked" just so happens to be an inflection of "work." (Not once did s/he mention synonyms.)

Quote:
Originally Posted by tuxor View Post
And also the files you attached are simply in the StarDict format just like it is (indeed very well) documented here: http://code.google.com/p/babiloo/wiki/StarDict_format
That page explains the binary format used by StarDict, not the .babylon source format that I used.

Quote:
Originally Posted by tuxor View Post
I already talked about "simulating" morphology support using the synonyms feature of StarDict. And it is definitely insane simulating morphology support in this way whenever the language is a bit more complex - for Hungarian, Ancient Greek or even German the synonyms file would become incredibly large and you wouldn't even end up with "proper" morphology support.
The synonym file would be indeed a bit larger, but it doesn't significantly delay the lookup speed. For example, I created an Arabic-English StarDict dictionary with more than 80000 entries whose lookup speed is about the same as other languages on my ancient iPhone, even though most entries had on average 35+ inflection definitions.

You really may want to do some actual tests instead of purely relying on third hand information!
Doitsu is offline   Reply With Quote
Old 08-19-2013, 02:51 PM   #11
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
Quote:
Originally Posted by Doitsu View Post
You obviously don't understand the difference between inflections and synonyms. For example, "brought" is an inflection not a synonym of "bring" and any dictionary software that allows users to find a headword by searching for an inflected form does support inflections.
According to your interpretation of "StarDict supports morphology" a synonym is technically the same as an inflection. It is your own examplary dictionary that uses the synonyms feature of StarDict to store inflections. That is all, I'm refering to. I'm well aware of the definition of a synonym and an inflection respectively.

Quote:
Originally Posted by Doitsu View Post
No s/he did not. If you re-read his/her post you'll find that s/he mentions that searching for "worked" didn't bring up the entry for "work" and "worked" just so happens to be an inflection of "work." (Not once did s/he mention synonyms.)
You got me wrong here. The OP complains that ColorDict does not support morphology with his StarDict dictionaries. I just assumed your definition of "StarDict supports morphology" and thus concluded: "ColorDict does not support the synonyms feature of StarDict". But this is just false. That's all I said.

Quote:
Originally Posted by Doitsu View Post
That page explains the binary format used by StarDict, not the .babylon source format that I used.
I assumed, we had a discussion about the features of the StarDict format and not about some arbitrary dictionary format you mentioned.

Quote:
Originally Posted by Doitsu View Post
The synonym file would be indeed a bit larger, but it doesn't significantly delay the lookup speed. For example, I created an Arabic-English StarDict dictionary with more than 80000 entries whose lookup speed is about the same as other languages on my ancient iPhone, even though most entries had on average 35+ inflection definitions.
In this I must admit at least, that such a synonyms file wouldn't exceed 100 MB and might even stay at fairly under 50 MB which is a no-brainer for today's hardware specs, even if the data is stored in an array/a list at runtime. Since it's stored in a database anyway in most cases, it shouldn't be much of a problem in most cases even on weak hardware.

Let me conclude: I'm well aware of your definition of "StarDict supports morphology" now and I accept that definition, in order to get some progress in this thread. We can now proceed with the OPs question. Does ColorDict support morpholgy? Because I'm using ColorDict on a daily basis using dictionaries that do make heavy use of the ".syn" file of the StarDict format, I have to say: Yes, ColorDict supports morphology.

The OP reports that ColorDict does not show the entry for "work", when he looks up "worked". Hence we conclude, that his dictionary does not have the entry "worked" in its ".syn" file. The only thing we have to clarify now: How come, that his desktop application does indeed show the entry "work" when he is looking up "worked", even though "worked" is not in the ".syn" file of the dictionary?

Last edited by tuxor; 08-19-2013 at 02:58 PM.
tuxor is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can put colordict dictionaries on SD card? pietro Sony Reader Dev Corner 0 08-27-2012 03:46 AM
colordict or fora for dictionary? shootist Android Devices 2 03-06-2012 11:03 AM
How do you add dictionaries to colordict? shootist Android Devices 1 03-02-2012 12:19 AM
colordict translation dictionaries question sovre Android Devices 1 02-26-2012 05:35 PM
ezPDF Reader + ColorDict integration Parsa Android Devices 6 02-01-2012 10:13 PM


All times are GMT -4. The time now is 04:57 AM.


MobileRead.com is a privately owned, operated and funded community.