Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Tolino

Notices

Reply
 
Thread Tools Search this Thread
Old 02-09-2020, 04:47 PM   #1
Peripathetic
Enthusiast
Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.
 
Posts: 38
Karma: 90402
Join Date: Feb 2019
Device: Tolino Shine 3
QuickDicBuilder: Custom dictionaries on the Tolino

Dictionaries used by the Tolino app are stored under .tolino/dictionaries/ on the user data partition. The format used is that of QuickDic (*.quickdic).

Existing Dictionaries

The original QuickDic was an Android app written by Thad Hughes and eventually open-sourced. Dictionary files were hosted on Google Code and available for download but all of them got deleted and were apparently lost when Google shut down the website. A Web Archive snapshot of the project repository is available but files cannot be downloaded this way.

The project was later resurrected as QuickDic Restored by Reimar Döffinger. The author's repository contains a lot of dictionaries generated from Wiktionary, a sister project of Wikipedia, which was also the source of the original QuickDic data. However, as part of his work on the app, the author improved the dictionary format, which means that newer dictionaries (v007 instead of v006) are no longer compatible with the Tolino.

These Wiktionary-based dictionaries can be downloaded on GitHub:Make sure to download the files labeled v006 only.

Creating Dictionaries: The Tool

DictionaryPC is a Java tool for generating QuickDic dictionaries accompanying the QuickDic app:GitHub user Gitsaibot authored shell scripts for generating QuickDic dictionaries specifically with the Tolino in mind (the .jar file here is exactly the same as in the original project):Since it is a Java application, it needs JRE to run (portable version). Further, it requires the following classes: Common Compress, Common Lang3, International Components for Unicode, Xerces-J Impl.

For convenience, I packaged everything necessary to run it in a Windows environment into a single archive, which I named QuickDicBuilder. Here's how to use it:
  • Download and unpack: QuickDicBuilder.zip
  • Edit QuickDicBuilder.cmd and set JAVA_EXE to point to the Java binary on your system.
  • QuickDicBuilder can now be called just like any other command-line utility.
Note: Thad Hughes are Reimar Döffinger are the original authors, I am only redistributing this. For source code, please refer to the GitHub links above.

Creating Dictionaries: How to Use It

The dictionary generation tool is functional but not very well documented. Some extra information how it is supposed to be used can be obtained by reading old, closed GitHub issues and its source code.

The utility supports several input formats: "Wiktionary", "tab_separated", and "Chemnitz". The latter format follows that of several German dictionaries available here. Tab-separated is the most straightforward format to use. Perhaps it's best to illustrate how to use it by example.

Case #1: Dict.cc

Dict.cc dictionaries can be downloaded (for personal use) from:
https://www1.dict.cc/translation_file_request.php

I downloaded their Russian-English dictionary, and converted it to QuickDic format with the following command:

QuickDicBuilder --dictInfo="Dict.cc Russian-English" --dictOut="RU-EN_DictCC.quickdic" --input1="dictcc.ru-en.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="dictcc" --lang1="RU" --lang1Stoplist="StopLists\xx.txt" --lang2="EN"

I did not have a Russian stoplist so I used an empty one. Stoplists include frequently-appearing words that should be dropped from index. It'd probably be better to use one.

This conversion is relatively easy because the format of the downloaded file follows what the utility expects as its "tab_separated" input.

Case #2: CC-CEDICT

CC-CEDICT is a Chinese-English dictionary that can be downloaded from:
https://www.mdbg.net/chinese/dictionary?page=cc-cedict

Here, the conversion command was:

QuickDicBuilder --dictInfo="CC-CEDICT Chinese-English" --dictOut="CC-CEDICT.quickdic" --input1="cedict_ts.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="cc-cedict" --lang1="ZH" --lang1Stoplist="StopLists\xx.txt" --lang2="EN" --lang1Stoplist="StopLists\en.txt"

However, the input data needed to be rearranged first from:
SimplifiedHeadword TraditionalHeadword [Pronunciation] Definition
to:
SimplifiedHeadword TraditionalHeadword<Tab>Definition /Pronunciation/

For this purpose I used the following regular expression with sed:

sed -e "s/^ *\([^ ]*\) \([^ ]*\) *\[ *\(.*\) *\] *\/ *\(.*\) *\/.*$/\1 \2\t\4 \/\3\//g" cedict_ts.u8 > cedict_ts.txt

Results

This was done quickly just to check if it works but if you want to, you can download the dictionary files I generated.
Peripathetic is offline   Reply With Quote
Old 02-11-2020, 09:42 AM   #2
Morioh
Member
Morioh began at the beginning.
 
Posts: 15
Karma: 10
Join Date: May 2018
Device: Tolino shine
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
Morioh is offline   Reply With Quote
Old 02-25-2020, 08:06 AM   #3
Peripathetic
Enthusiast
Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.
 
Posts: 38
Karma: 90402
Join Date: Feb 2019
Device: Tolino Shine 3
Quote:
Originally Posted by Morioh View Post
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
JMDict is an XML file you'd have to parse. This would be an extra step.

But it seems the same data is also available as a "legacy" EDICT download:
http://ftp.monash.edu/pub/nihongo/edict.zip

The EDICT version is a plain-text, JIS-encoded text file. So all you'd have to do is convert it to UTF8, and then you can transform it with regular expressions like I did with sed for the CC-CEDICT.
Peripathetic is offline   Reply With Quote
Old 02-25-2020, 10:18 AM   #4
Morioh
Member
Morioh began at the beginning.
 
Posts: 15
Karma: 10
Join Date: May 2018
Device: Tolino shine
Thank you for the mention but i should have said that i'm next to completely code illiterate .
So this is a pretty cool tool but i cannot use it.
Though i'm quite happy that even tolino has a dedicated way to make custom dictionaries since someone can get a bit of fun and usage out of this.
P.S Actually my toline is not even capable of selecting the text so its pointless ^^

Last edited by Morioh; 02-25-2020 at 10:54 AM.
Morioh is offline   Reply With Quote
Old 04-05-2020, 06:45 AM   #5
oliverdb
Junior Member
oliverdb began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Apr 2020
Device: Tolino 3 Shine
Hi,
first, I would like to thank @peripathetic for his/her amazing work. Thank you!

I managed to install TWRP, boot, and the modified EPubprob. I also modified the file assets/environments/app.properties.prod so to keep the connection to Thalia shop, as I still use it to keep my library there.

But, I did not manage to understand where should I put the dictionaries!!! everybody speaks about .tolino/dictionaries/ but I do not have that folder, and still I know I have some dictionaries installed.

Can someone give me some light on this issue? Probably it is very simple, but I cannot figure it out!

Regards,
oliverdb is offline   Reply With Quote
Old 04-05-2020, 04:51 PM   #6
Peripathetic
Enthusiast
Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.
 
Posts: 38
Karma: 90402
Join Date: Feb 2019
Device: Tolino Shine 3
Quote:
Originally Posted by oliverdb View Post
But, I did not manage to understand where should I put the dictionaries!!! everybody speaks about .tolino/dictionaries/ but I do not have that folder, and still I know I have some dictionaries installed.
When you connect the Tolino to your computer via USB, if you followed my customization guide to change the connection mode to MTP, it should look like that:

Spoiler:

If you left the default settings, it will show up as a Mass Storage Device, there will be a drive letter for it (like D:).
Attached Thumbnails
Click image for larger version

Name:	Tolino Dictionaries.png
Views:	3155
Size:	26.0 KB
ID:	178135  
Peripathetic is offline   Reply With Quote
Old 04-06-2020, 07:00 AM   #7
oliverdb
Junior Member
oliverdb began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Apr 2020
Device: Tolino 3 Shine
Wow, easier could it not be. Thank you again!
oliverdb is offline   Reply With Quote
Old 05-24-2020, 06:24 AM   #8
AnimalOfArt
Groupie
AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.AnimalOfArt ought to be getting tired of karma fortunes by now.
 
AnimalOfArt's Avatar
 
Posts: 175
Karma: 1044642
Join Date: Jun 2017
Device: changing frequently
Is it possible to convert a stardict dictionary to a file that Toligen can convert to QuickDic v6?

EDIT: Nevermind. I used pygossary for this.

Now, thanks to PocketbookDic (https://github.com/Markismus/PocketBookDic) I have the Kindle Duden converted from mobi to stardict and converted that by using pyglossary to a tablimited textfile and now with Toligen to quickdic!

EDIT: Unfortunately both dictionaries won't work

Last edited by AnimalOfArt; 05-28-2020 at 01:57 PM.
AnimalOfArt is offline   Reply With Quote
Old 01-13-2021, 02:34 AM   #9
toancv
Connoisseur
toancv began at the beginning.
 
Posts: 69
Karma: 10
Join Date: Nov 2018
Device: Kindle paperwhite, Likebook Mars, Kobo Aura Ed. 2, Kobo Touch
Hi, I successfully create the dictionary English - Vietnamese but have problem of font issue. The font in translate function does not display Vietnamese correctly. Anyone can help? Thanks!

Last edited by toancv; 01-14-2021 at 02:57 AM.
toancv is offline   Reply With Quote
Reply

Tags
dictionarypc, quickdic, tolino

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New custom English dictionaries ShellShock Kobo Developer's Corner 166 09-22-2020 02:48 PM
Can I use custom fonts in Tolino Shine 2 HD? swapjim Tolino 9 06-24-2018 12:52 PM
Keeping custom dictionaries after syncing? Ceiyne Kobo Reader 3 05-06-2015 01:29 AM
Best way to add custom dictionaries to iOS? avid01 Apple Devices 2 02-26-2015 01:29 PM
Are there ereaders that allow custom dictionaries? blu- Which one should I buy? 4 03-20-2014 06:10 PM


All times are GMT -4. The time now is 02:04 AM.


MobileRead.com is a privately owned, operated and funded community.