Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 07-04-2020, 03:20 PM   #1
karl42
Junior Member
karl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy blue
 
Posts: 6
Karma: 13090
Join Date: Jul 2020
Device: Kobo Forma
New bilingual Kobo dictionaries based on Wiktionary

Hi everyone!

As part of my project WikDict, I have created a large set of bilingual dictionaries in both the native Kobo dictionary format and Stardict format. If you want to have a look at the data quality before downloading any dictionaries, I recommend using the web interface at https://www.wikdict.com, which based on the same data source.

Kobo dictionaries: http://download.wikdict.com/dictionaries/kobo/
Stardict dictionaries: http://download.wikdict.com/dictionaries/stardict/
License: CC-BY-SA 3.0

Please give me some feedback!
karl42 is offline   Reply With Quote
Old 07-04-2020, 03:59 PM   #2
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,736
Karma: 6990705
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Nice work overall!

I haven't taken a close look at it yet, but PyGlossary (based on your GH profile and the output, it appears that that's what you're using to convert the dictionaries) has some bugs in it's handling of prefixes for Kobo's dictionary format.

In particular, uppercase words (e.g. Cumberland in sv-en) will appear in the autocomplete, but will appear as no definition found. See here for my notes on proper prefix generation.

One way to fix this automatically without making modifications to PyGlossary would be to use dictzip-decompile and dictgen (see my thread about dictutil) to regenerate the dictionaries.
geek1011 is offline   Reply With Quote
Advert
Old 07-05-2020, 06:08 AM   #3
karl42
Junior Member
karl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy blue
 
Posts: 6
Karma: 13090
Join Date: Jul 2020
Device: Kobo Forma
Quote:
Originally Posted by geek1011 View Post
I haven't taken a close look at it yet, but PyGlossary (based on your GH profile and the output, it appears that that's what you're using to convert the dictionaries) has some bugs in it's handling of prefixes for Kobo's dictionary format.
You're right, I'm using pyglossary. I didn't know about that problem and neither did the maintainer. See https://github.com/ilius/pyglossary/issues/219 for the progress on fixing this.

Quote:
Originally Posted by geek1011 View Post
See here for my notes on proper prefix generation.
That is very helpful, thanks! Could you clarify how the last two examples work? The input is the same for both (at least visually), but they yield different prefixes.
karl42 is offline   Reply With Quote
Old 07-05-2020, 11:03 AM   #4
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,736
Karma: 6990705
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by karl42 View Post
That is very helpful, thanks! Could you clarify how the last two examples work? The input is the same for both (at least visually), but they yield different prefixes.
That was in issue in the page due to whitespace collapsing. The first one had two spaces, but the second had one. I've responded on the issue on PyGlossary.
geek1011 is offline   Reply With Quote
Old 07-05-2020, 01:24 PM   #5
karl42
Junior Member
karl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy blue
 
Posts: 6
Karma: 13090
Join Date: Jul 2020
Device: Kobo Forma
I updated the Kobo dictionaries with the recent pyglossary fixes. If anything else seems to be wrong, don't hesitate to let me know!
karl42 is offline   Reply With Quote
Advert
Old 07-08-2020, 04:47 AM   #6
Carmelocotonto
Connoisseur
Carmelocotonto began at the beginning.
 
Carmelocotonto's Avatar
 
Posts: 93
Karma: 12
Join Date: Nov 2018
Location: Salamanca
Device: kobo Clara HD, Onyxboox C67
Quote:
Originally Posted by karl42 View Post
I updated the Kobo dictionaries with the recent pyglossary fixes. If anything else seems to be wrong, don't hesitate to let me know!
Hi Karl, your job is spectacular. For fun I have used your 'teis' for making a single dictionary 'several languages - Spanish'. When you create new 'teis' I will add them to my dictionary. Thanks, many thanks.
Carmelocotonto is offline   Reply With Quote
Old 02-13-2021, 11:30 AM   #7
InMyPocket
Member
InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.
 
Posts: 21
Karma: 3620
Join Date: Feb 2021
Device: Pocketbook
Hi Karl,

It seems that EN-FR and EN-NL dictionaries miss some words/expressions: e.g. words between "trough" and "trouser press" are missing in the dicthtml-en-nl file and words between "trout" and "trouvère" in the dicthtml-en-fr file. So, the word "trouser" is missing in both of them (but is on the wikdict.com website).


A bug in the convertion to dicthtml ? Thanks a lot for your great work !
InMyPocket is offline   Reply With Quote
Old 02-24-2021, 08:05 AM   #8
karl42
Junior Member
karl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy blue
 
Posts: 6
Karma: 13090
Join Date: Jul 2020
Device: Kobo Forma
Quote:
Originally Posted by InMyPocket View Post
Hi Karl,
It seems that EN-FR and EN-NL dictionaries miss some words/expressions: e.g. words between "trough" and "trouser press" are missing in the dicthtml-en-nl file and words between "trout" and "trouvère" in the dicthtml-en-fr file. So, the word "trouser" is missing in both of them (but is on the wikdict.com website).

A bug in the convertion to dicthtml ?
I've had a closer look at the case of "trouser". The conversion is fine. The word is not included because the Wiktionary page for "trousers" does not contain any translations but only a "see pants" reference:
https://en.wiktionary.org/wiki/trousers

I'm not yet able to handle these references, so the translation is missing in the dictionary.

Why is it shown on Wiktionary.com, then? Since I control the whole search and sorting UI on that page, I can take a bit more liberties when to show translations to the user. One aspect of that is that when no results are found, I try to synthesize translations by looking at Wiktionaries in other languages. In this specific case I returned a translation by going through the Finnish Wiktionary:
trousers (en) -> housut (fi) -> pantalon (fr)

If you would like updates to this issue, please subscribe to it on [github](https://github.com/karlb/wikdict-gen/issues/5).
karl42 is offline   Reply With Quote
Old 02-25-2021, 07:04 PM   #9
genoasalami
Connoisseur
genoasalami began at the beginning.
 
Posts: 52
Karma: 20
Join Date: Apr 2017
Device: KK3G, PW, Voyage, Oasis, Aura One, Forma
Quote:
Originally Posted by karl42 View Post

Please give me some feedback!
What would be involved in converting a single language offline wiki to a Stardict or dicthtml file? I know that wiki is huge, but maybe some slimmed-down version?

It would be nice to have this offline
genoasalami is offline   Reply With Quote
Old 02-26-2021, 01:54 AM   #10
karl42
Junior Member
karl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy bluekarl42 can differentiate black from dark navy blue
 
Posts: 6
Karma: 13090
Join Date: Jul 2020
Device: Kobo Forma
Quote:
Originally Posted by genoasalami View Post
What would be involved in converting a single language offline wiki to a Stardict or dicthtml file? I know that wiki is huge, but maybe some slimmed-down version?
If you want the complete wiki pages, the process would probably be
* Download wiki dump
* Read wiki markup from dump and convert to HTML pages
* Combine HTML pages into StarDict dictionary

Unless you are lucky and find tools that do this exactly in the way you want (or some already did the steps for you), this will involve at leas some amount of programming. Also that it can be cumbersome to find information in large wiki pages (e.g. translations in Wiktionary pages).

Since I go the different route of using a [semantically parsed version of Wiktionary](http://kaiko.getalp.org/about-dbnary/) and creating my own pages from that, I can't give specific instructions. I'm also not generating StarDict files directly, but I'm using [pyglossary](https://github.com/ilius/pyglossary).
karl42 is offline   Reply With Quote
Old 02-26-2021, 03:02 AM   #11
Semwize
Guru
Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.
 
Posts: 873
Karma: 252902
Join Date: Jun 2016
Device: Kobo
Quote:
Originally Posted by karl42 View Post
* Read wiki markup from dump and convert to HTML pages
PyGloosary seems to support .zim, that is, you can immediately convert to the desired format, kobo or stardict. Or not?
Semwize is offline   Reply With Quote
Old 03-21-2021, 07:57 PM   #12
takfarinas
Member
takfarinas began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jan 2021
Device: kobo h2o libra
plse i need de-ar . fr-ar . it-ar

Quote:
Originally Posted by karl42 View Post
Hi everyone!

As part of my project WikDict, I have created a large set of bilingual dictionaries in both the native Kobo dictionary format and Stardict format. If you want to have a look at the data quality before downloading any dictionaries, I recommend using the web interface at https://www.wikdict.com, which based on the same data source.

Kobo dictionaries: http://download.wikdict.com/dictionaries/kobo/
Stardict dictionaries: http://download.wikdict.com/dictionaries/stardict/
License: CC-BY-SA 3.0

Please give me some feedback!
hello
:h elp:
plse i need german/arabic
italian/arabic
french/arabic
i am tired to search help but i am ignorant in informatic
i have uploaded 3 very good dictionaried in epub form but i cant convert them to kobo forma
thnks a lot if you help me
takfarinas is offline   Reply With Quote
Old 03-21-2021, 07:59 PM   #13
takfarinas
Member
takfarinas began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jan 2021
Device: kobo h2o libra
plse ge-ar . fr-ar .it-ar

Quote:
Originally Posted by geek1011 View Post
That was in issue in the page due to whitespace collapsing. The first one had two spaces, but the second had one. I've responded on the issue on PyGlossary.
Please give me some feedback![/QUOTE]

hello
:h elp:
plse i need german/arabic
italian/arabic
french/arabic
i am tired to search help but i am ignorant in informatic
i have uploaded 3 very good dictionaried in epub form but i cant convert them to kobo forma
thnks a lot if you help me
takfarinas is offline   Reply With Quote
Old 03-21-2021, 08:02 PM   #14
takfarinas
Member
takfarinas began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jan 2021
Device: kobo h2o libra
plse i need de-ar . fr-ar . it-ar

Quote:
Originally Posted by InMyPocket View Post
Hi Karl,

It seems that EN-FR and EN-NL dictionaries miss some words/expressions: e.g. words between "trough" and "trouser press" are missing in the dicthtml-en-nl file and words between "trout" and "trouvère" in the dicthtml-en-fr file. So, the word "trouser" is missing in both of them (but is on the wikdict.com website).


A bug in the convertion to dicthtml ? Thanks a lot for your great work !
Please give me some feedback![/QUOTE]

hello
:h elp:
plse i need german/arabic
italian/arabic
french/arabic
i am tired to search help but i am ignorant in informatic
i have uploaded 3 very good dictionaried in epub form but i cant convert them to kobo forma
thnks a lot if you help me
takfarinas is offline   Reply With Quote
Old 10-10-2022, 06:45 AM   #15
PetBest
Member
PetBest began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Jun 2022
Device: Kobo Libra 2
Norwegian Englisch dictionary

This file dicthtml-no-en.zip is converted from an unzipped stardict file. Original is stardict-comn_sdict05_norwegian-english-2.4.2.tar.bz2 using follow command :

penelope -i /home/petbest/Downloads/Dict//stardict-comn_sdict05_norwegian-english-2.4.2.zip -j stardict -f no -t en -p kobo -o /home/petbest/Downloads/Dict/dichthtml-no-en
Attached Files
File Type: zip dicthtml-no-en.zip (244.1 KB, 416 views)
PetBest is offline   Reply With Quote
Reply

Tags
dictionaries, dictionary, kobo, stardict, translation


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bilingual w/Dictionaries patrickt Amazon Kindle 2 03-15-2019 05:39 PM
Any epub readers that can load bilingual dictionaries? jdege Android Devices 5 08-29-2017 03:04 PM
English Wiktionary for Pocketbook (En-En) SIRSteiner PocketBook 26 09-04-2014 05:26 AM
Bilingual dictionaries cpina Kobo Reader 2 12-02-2012 08:59 AM
Bilingual dictionaries cpina Barnes & Noble NOOK 0 12-01-2012 08:01 PM


All times are GMT -4. The time now is 03:35 AM.


MobileRead.com is a privately owned, operated and funded community.