Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 01-29-2021, 04:50 PM   #1
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
Question JMdict on Kobo (or any other recommended Japanese dictionaries?)

I've had a custom Japanese-English dictionary on my Kobo forever. It still works fine, but it was originally based a ~5 year old copy of Edict (from when the dictionary was made), so I wanted to update it to a newer version, and switch to JMdict.

I converted the latest JMdict to Kobo format using Pyglossary. The resulting file worked to some extent on my Kobo, but there was something wrong with it -- I could look up some words, but it would fail to find anything for other words. The words it didn't find were common words, so they should have been there, and I confirmed that they were in the original JMdict file I used.

I'm not sure if I did something wrong, or if there is a bug in Pyglossary causing the issue. I thought I'd ask around here first to see if anyone had already done what I'm trying to do, before I submit an issue to the Pyglossary folks.

I'm also not locked into the idea of using JMdict if there's a better JE dictionary for Kobo floating around out there.
Ceiyne is offline   Reply With Quote
Old 01-30-2021, 12:56 AM   #2
DevonHess
Can't actually read
DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.DevonHess ought to be getting tired of karma fortunes by now.
 
DevonHess's Avatar
 
Posts: 81
Karma: 335656
Join Date: Sep 2019
Device: Kobo Forma, Kobo Sage, Kindle PW2
I'm not sure what Pyglossary is, but the dictionary converter I've seen recommended the most is dictutil.

It's created by a regular poster of this forum geek1011, so you might have better support if you try using that.
DevonHess is offline   Reply With Quote
Advert
Old 01-30-2021, 09:52 AM   #3
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
Thanks for the recommendation. I looked at the various dictionary utilities in dictutil, and it seems like you need to have a fairly high level of technical knowledge about the dictionaries' formats in order to do something useful with it. While I am pretty technical, I don't know much about these formats.

The tool I heard the most about "back in the day" (years ago) when I previously played with this stuff was Penelope. When I looked that up, I found that the project had been retired and Pyglossary was the tool recommended by the Penelope developer. From what I understand, the Penelope code was incorporated into Pyglossary.
Ceiyne is offline   Reply With Quote
Old 01-30-2021, 10:16 AM   #4
Semwize
Guru
Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.
 
Posts: 914
Karma: 275656
Join Date: Jun 2016
Device: Kobo
Quote:
Originally Posted by Ceiyne View Post
Thanks for the recommendation. I looked at the various dictionary utilities in dictutil, and it seems like you need to have a fairly high level of technical knowledge about the dictionaries' formats in order to do something useful with it. While I am pretty technical, I don't know much about these formats.

The tool I heard the most about "back in the day" (years ago) when I previously played with this stuff was Penelope. When I looked that up, I found that the project had been retired and Pyglossary was the tool recommended by the Penelope developer. From what I understand, the Penelope code was incorporated into Pyglossary.
Everything is easier

Use pyglossary (latest version) to convert JMdict / EDICT (xml if I'm not mistaken) to .df file. And then use dictgen. This should help you.
Semwize is offline   Reply With Quote
Old 01-30-2021, 10:42 AM   #5
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
Ah, good idea. Thank you, I'll try that.
Ceiyne is offline   Reply With Quote
Advert
Old 01-30-2021, 11:50 AM   #6
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
Quote:
Originally Posted by Semwize View Post
Use pyglossary (latest version) to convert JMdict / EDICT (xml if I'm not mistaken) to .df file. And then use dictgen. This should help you.
I tried this but unfortunately it didn't work either. It successfully generated a dictionary, but after loading it on the Kobo it never found an entry for any of the words I looked up (words I know are present in JMdict). I filed an issue on the dictgen github so hopefully I'll find out what's wrong (I also filed one on Pyglossary earlier today).

It's quite possible it's user error on my part but I don't see anything I'm doing wrong and the programs are running without error and generating something that at least "looks" right, to the extent I can check.
Ceiyne is offline   Reply With Quote
Old 01-30-2021, 11:59 AM   #7
Semwize
Guru
Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.
 
Posts: 914
Karma: 275656
Join Date: Jun 2016
Device: Kobo
Quote:
Originally Posted by Ceiyne View Post
I don't see anything I'm doing wrong and the programs are running without error and generating something that at least "looks" right, to the extent I can check.
Open the .df file with any text editor (I use EmEditor) and search for these words. Actually, you can also see and understand why Kobo isn't looking for them.
Semwize is offline   Reply With Quote
Old 01-30-2021, 12:14 PM   #8
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
I tried that, and the words do appear in the df file, and the entries look right as far as I can tell (based on what I see in the dictutil documentation). I got responses from both the Pyglossary developer and the dictutil developer on one of the issues I opened and it sounds like Japanese is not fully supported by Pyglossary/dictutil.

https://github.com/ilius/pyglossary/issues/292

So, I guess I have to wait. In case anyone comes across this thread later and wants to see the status of these issues, here's the other one I opened on dictutil:

https://github.com/pgaskin/dictutil/issues/16
Ceiyne is offline   Reply With Quote
Old 01-30-2021, 12:20 PM   #9
Semwize
Guru
Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.
 
Posts: 914
Karma: 275656
Join Date: Jun 2016
Device: Kobo
@Ceiyne

Try converting JMdict to stardict using pyglossary. And then Penelope v3.1.3 in dicthtml.

Wonder how it is with the Japanese language.

Although geek1011 writes that ALL existing tools will not handle correctly. I don't know if he also included Penelope on the list.

Last edited by Semwize; 01-30-2021 at 12:23 PM.
Semwize is offline   Reply With Quote
Old 01-30-2021, 01:28 PM   #10
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
I tried using pyglossary+Penelope like you said, but it didn't work either. The files were generated without error, but the Kobo couldn't find anything in the dictionary. Thank you for all of your suggestions, but I think I'm just going to wait for geek1011 and the other devs to figure it all out.
Ceiyne is offline   Reply With Quote
Old 01-30-2021, 02:04 PM   #11
Semwize
Guru
Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.Semwize ought to be getting tired of karma fortunes by now.
 
Posts: 914
Karma: 275656
Join Date: Jun 2016
Device: Kobo
Quote:
Originally Posted by Ceiyne View Post
From what I understand, the Penelope code was incorporated into Pyglossary.
As I understand it, the latest versions of Pyglossary are using dictutil and not Penelope. That is why I suggested that you see if the same error will occur with Penelope. It turned out that yes, unfortunately.

It is strange that for so many years of the existence of the Penelope project, they did not pay attention to this. Or I just didn't see.
Semwize is offline   Reply With Quote
Old 01-30-2021, 03:12 PM   #12
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,804
Karma: 7025947
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by Ceiyne View Post
When I looked that up, I found that the project had been retired and Pyglossary was the tool recommended by the Penelope developer. From what I understand, the Penelope code was incorporated into Pyglossary.
Yes, that's correct. Recently, the plugin was rewritten to match dictutil's code.

Quote:
Originally Posted by Ceiyne View Post
I got responses from both the Pyglossary developer and the dictutil developer on one of the issues I opened and it sounds like Japanese is not fully supported by Pyglossary/dictutil.
Yes, the dictutil developer is me, and I've responded to both issues.

Quote:
Originally Posted by Semwize View Post
Wonder how it is with the Japanese language.

Although geek1011 writes that ALL existing tools will not handle correctly. I don't know if he also included Penelope on the list.
That's also correct. None of the existing tools handle the Japanese prefix generation and word matching algorithms, and nobody's bothered to reverse engineer it yet.

Quote:
Originally Posted by Semwize View Post
As I understand it, the latest versions of Pyglossary are using dictutil and not Penelope. That is why I suggested that you see if the same error will occur with Penelope. It turned out that yes, unfortunately.
Yep.

Quote:
It is strange that for so many years of the existence of the Penelope project, they did not pay attention to this. Or I just didn't see.
Penelope was created during the time of v1 (pre-2017) dictionaries. At that point, the prefix generation was much simpler. In addition, it didn't try to handle variants , images, or other advanced features at all; it stuck to simple word-definition pairs.

Dictutil handles v2 dictionaries, and I'll be releasing v3 (post-fall-2020) dictionary support as soon as I have enough time. Note that I should be able to work around the Japanese differences entirely using the new v3 prefix exception mechanism.
geek1011 is offline   Reply With Quote
Old 02-03-2021, 10:38 AM   #13
toancv
Connoisseur
toancv began at the beginning.
 
Posts: 69
Karma: 10
Join Date: Nov 2018
Device: Kindle paperwhite, Likebook Mars, Kobo Aura Ed. 2, Kobo Touch
Ceiyne, may I have your dictionary file? Maybe I can find the reason why.
toancv is offline   Reply With Quote
Old 02-03-2021, 10:52 PM   #14
Ceiyne
Enthusiast
Ceiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notesCeiyne can name that song in three notes
 
Posts: 37
Karma: 24602
Join Date: May 2015
Device: Kobo Glo HD
I appreciate the offer, but the developer has said that the current conversion tools don't support Japanese, so I don't think the file is going to be useful.
Ceiyne is offline   Reply With Quote
Old 02-04-2021, 12:23 AM   #15
toancv
Connoisseur
toancv began at the beginning.
 
Posts: 69
Karma: 10
Join Date: Nov 2018
Device: Kindle paperwhite, Likebook Mars, Kobo Aura Ed. 2, Kobo Touch
Hi Ceiyne, it's up to you. From my end, the Japanese and Chinese dictionaries work ok with both Penelop or Dictutil conversion. They work perfectly with the current firmware 4.25. Kobo only has the problem with dictionaries that have number of files larger than 65k if I am not wrong (the Chinese dictionary) which I cannot find solution here.
toancv is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing recommended books from home page - Kobo Aura 2 Reads A Lot Kobo Reader 18 09-05-2017 08:21 PM
Kobo never again! What other brands/models are recommended? Bob the builder Which one should I buy? 33 07-16-2016 12:39 PM
Touch Kobo sideloaded books and Japanese dictionaries AColobus Kobo Reader 1 12-24-2014 11:01 AM
Touch building custom dictionaries, especially Japanese-English tshering Kobo Reader 0 07-12-2012 06:00 PM
Remove the Recommended for You Bar from Kobo Mosaic Widget!!!!!! Kboland Kobo Tablets 2 03-06-2012 02:31 PM


All times are GMT -4. The time now is 12:22 PM.


MobileRead.com is a privately owned, operated and funded community.