Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > KOReader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-23-2020, 12:37 PM   #1
mzel
Connoisseur
mzel began at the beginning.
 
Posts: 92
Karma: 10
Join Date: Apr 2016
Device: Kobo Forma
Question about dictionaries

Is the support of morphology (genders, conjugations, contractions, etc.) the function of a dictionary or the koreader application?

It is not so much of a problem for English, but it is for many other languages
mzel is offline   Reply With Quote
Old 10-29-2020, 05:09 PM   #2
mergen3107
Wizard
mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.
 
mergen3107's Avatar
 
Posts: 1,060
Karma: 3000026
Join Date: Feb 2012
Location: Cape Canaveral
Device: Kindle Scribe
AFAIK, Stardict (which engine is used in KOReader) does not support morphology. However it has fuzzy search, which looks for the most similar looking word if no exact match found. Sometimes it works, sometimes doesn’t. In the latter case I just tap and hold the word title in the dict popup in KOReade and type my required word manually.
mergen3107 is offline   Reply With Quote
Old 10-30-2020, 03:25 PM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mergen3107 View Post
AFAIK, Stardict (which engine is used in KOReader) does not support morphology.
StarDict does support inflections. The KOReader StarDict engine does not.

EDIT: I was wrong, KOReader has been supporting .syn files since late 2019.

Last edited by Doitsu; 11-02-2020 at 11:41 AM.
Doitsu is offline   Reply With Quote
Old 11-01-2020, 01:09 PM   #4
mzel
Connoisseur
mzel began at the beginning.
 
Posts: 92
Karma: 10
Join Date: Apr 2016
Device: Kobo Forma
I guess you are talking about the .syn (synonyms) file. It is not really inflection. It requires you to list all the possible forms of the word as opposed to the list of rules of the language.
That means that if there is ~100 forms of the verb in Italian you need to provide 100 forms for each of the verbs as opposed to 100 rules for all the correct verbs combined
mzel is offline   Reply With Quote
Old 11-01-2020, 01:37 PM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mzel View Post
I guess you are talking about the .syn (synonyms) file.
I was indeed referring to .syn files, which the KOReader StarDict engine doesn't support.

Quote:
Originally Posted by mzel View Post
That means that if there is ~100 forms of the verb in Italian you need to provide 100 forms for each of the verbs as opposed to 100 rules for all the correct verbs combined
AFAIK, there are no cross-platform Open Source dictionary engines that support defining POS-based morphology rules.
Having to define all forms for each entry may seem like a rather primitive method, but it works surprisingly well.

BTW, If you want to add inflections to your own StarDict dictionary, you might find Tvangeste's inflection word lists for English, French, Italian, German, Spanish, Portuguese, Polish and Russian helpful.

Last edited by Doitsu; 11-02-2020 at 11:41 AM.
Doitsu is offline   Reply With Quote
Old 11-01-2020, 06:36 PM   #6
NiLuJe
BLAM!
NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.
 
NiLuJe's Avatar
 
Posts: 13,477
Karma: 26012494
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
Doesn't it? I recall a host of issues about sdcv being *slow* when dealing with synonyms, but handling them nonetheless .
NiLuJe is offline   Reply With Quote
Old 11-02-2020, 09:02 AM   #7
Galunid
Zealot
Galunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and graceGalunid herds cats with both ease and grace
 
Posts: 122
Karma: 43580
Join Date: Apr 2016
Device: KPW3, Kobo Clara HD, Onyx Boox Nova 2
Yup, it should support it, at least according to the issue @NiLuJe mentioned: https://github.com/koreader/koreader/issues/5437
Galunid is offline   Reply With Quote
Old 11-02-2020, 11:40 AM   #8
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by NiLuJe View Post
Doesn't it? I recall a host of issues about sdcv being *slow* when dealing with synonyms, but handling them nonetheless .
You are of course right. Apparently, KOReader has been supporting .syn files since 2019.

(I updated my initial post.)
Doitsu is offline   Reply With Quote
Old 11-02-2020, 10:44 PM   #9
mzel
Connoisseur
mzel began at the beginning.
 
Posts: 92
Karma: 10
Join Date: Apr 2016
Device: Kobo Forma
Reading up on this forum and 3-4 others I came to the conclusion that my options are:
1) Generating .syn file out of the grammar rules from the link above or the .aff file from a Goldendict dictionary
2) Trying to build a command line Goldendict for Kobo and write a plugin for it in koreader
3) Implementing those same rules in .lua wrapper around sdcv and trying to find a closest match from koreader
4) finding a ready-made .syn file for the language - Italian in this case
5) Something else? Kindle was able to do a better job in this department. I mean the native Kindle reader with dictionaries built for it. The Italian-English dictionary was pretty good in this regard. The Italian-Russian was not perfect, but still better than what we have now under koreader. It uses the same initial vocabulary but handles inflections way better. I never tried to install the dictionaries under koreader on kindle
6) Forego all of the above and use manual entry plus a guesswork to arrive at the correct headword

All suggestions and comments are welcome

Last edited by mzel; 11-03-2020 at 11:11 AM.
mzel is offline   Reply With Quote
Old 11-03-2020, 02:17 AM   #10
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mzel View Post
4) finding a ready-made .syn file for the language - Italian in this case
AFAIK, .syn files are dictionary-specific index files that can't be re-used. They're automatically generated by StarDict Editor when you compile a Babylon GLS source file.
Quote:
Originally Posted by mzel View Post
5) Something else? Kindle was able to do a better job in this department.
You could unpack one of the free bilingual Oxford dictionaries that Amazon offers as optional downloads for eInk Kindle users with KindleUnpack and extract the inflection data.

Here's an example entry from the Italian-English Oxford dictionary:

Code:
<idx:orth value="abbacchiato">
    <idx:infl>
        <idx:iform name="" value="abbacchiata"/>
        <idx:iform name="" value="abbacchiate"/>
        <idx:iform name="" value="abbacchiati"/>
    </idx:infl>
</idx:orth>
The Babylon GLS equivalent is:

Code:
abbacchiato|abbacchiata|abbacchiate|abbacchiati
Doitsu is offline   Reply With Quote
Old 11-03-2020, 10:52 AM   #11
mzel
Connoisseur
mzel began at the beginning.
 
Posts: 92
Karma: 10
Join Date: Apr 2016
Device: Kobo Forma
re 4) IMHO the only dictionary specific part of the .syn is the set of basic words. Otherwise it should be language specific. As far as I understand .syn is part of the input, not output for the Stardict converter.
re 5) That again would only be the intermediate point to create a .syn for Stardict. BTW do you have the link to those dictionaries?
mzel is offline   Reply With Quote
Old 11-03-2020, 12:12 PM   #12
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mzel View Post
As far as I understand .syn is part of the input, not output for the Stardict converter.
No, the opposite is true. StarDict Editor will automatically generate a .syn file if the Babylon GLS source file contains inflections.
This also means that the .syn files are not interchangable.

You might want to compile a simple Babylon GLS source file yourself with StarDict Editor. (I attached a small test file to this post; the input file is fr_en_sample.babylon)

Quote:
Originally Posted by mzel View Post
BTW do you have the link to those dictionaries?
No, but if you own a registered eInk Kindle, you can download them for free.
Doitsu is offline   Reply With Quote
Old 11-03-2020, 01:27 PM   #13
pazos
cosiñeiro
pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.
 
Posts: 1,271
Karma: 2200049
Join Date: Apr 2014
Device: BQ Cervantes 4
Quote:
Originally Posted by mzel View Post
2) Trying to build a command line Goldendict for Kobo and write a plugin for it in koreader
3) Implementing those same rules in .lua wrapper around sdcv and trying to find a closest match from koreader
Contributions are always welcome.

Keep in mind a few things:

- Goldendict is not a dict format. Is an app supporting plenty of dict formats. You'll need to choose which one of the supported goldendict formats you want to support, peek into goldendict code and get something that works for that specific format.

- There's no need to build a commandline tool. A lua C module would be fine too.

- Lua is too slow for a dict app. The code needs to be written in C/C++. The interface between your code and KOReader can be coded as you want.

From the supported colordict formats slob and zim look the most interesting to me, but I don't know if they have the features you need.

The end goal is to have a program/library that works in all KO devices, which are mostly linux arm devices. Support for 3rd party apps is already available on devices that are intended to run apps (android, linux, mac)
pazos is offline   Reply With Quote
Old 01-17-2021, 08:27 PM   #14
mzel
Connoisseur
mzel began at the beginning.
 
Posts: 92
Karma: 10
Join Date: Apr 2016
Device: Kobo Forma
Quote:
Originally Posted by Doitsu View Post
No, the opposite is true. StarDict Editor will automatically generate a .syn file if the Babylon GLS source file contains inflections.
This also means that the .syn files are not interchangable.

You might want to compile a simple Babylon GLS source file yourself with StarDict Editor. (I attached a small test file to this post; the input file is fr_en_sample.babylon)


No, but if you own a registered eInk Kindle, you can download them for free.
I did try, in more than 1 way and failed. I did download that dictionary, but kindleUnpack fails there with:


Code:
line 770, in process_all_mobi_headers
    raise unpackException('Book is encrypted')
lib.kindleunpack.unpackException: Book is encrypted



Error: Unpacking Failed
And DRM removal for that file does not work too. I tried with the Calibre plugin, and with the standalone Kindle DRM removal tool too. Both fail at some point. After that I found:
https://github.com/apprenticeharper/...ols/issues/276
so at least I am not alone
mzel is offline   Reply With Quote
Old 01-23-2021, 12:11 AM   #15
kandwo
Addict
kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.
 
Posts: 356
Karma: 10703708
Join Date: Dec 2020
Device: Kindle Paperwhite 3
I would recommend not relying on Koreader's internal dictionary, bur rather using GoldenDict, which works fine with inflections. You have to install both the dictionary (stardict or other compatible format) and a hunspell dictionary for the language so that it recognizes the inflected forms.

This works well for the languages that I've tried so far: Russian, Ukrainian and English.
kandwo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Question/Wish in support of external dictionaries on Android Norbi24 KOReader 9 09-05-2019 04:46 PM
Touch Dictionaries shouled Kobo Reader 2 07-19-2012 06:52 PM
colordict translation dictionaries question sovre Android Devices 1 02-26-2012 05:35 PM
Just got K3 and need some help with 3G and dictionaries... pollo Amazon Kindle 1 12-29-2011 05:13 PM
A question about translation dictionaries Nate the great Workshop 5 06-05-2009 08:10 AM


All times are GMT -4. The time now is 09:00 PM.


MobileRead.com is a privately owned, operated and funded community.