Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Tolino

Notices

Reply
 
Thread Tools Search this Thread
Old 02-09-2020, 04:47 PM   #1
Peripathetic
Enthusiast
Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.
 
Posts: 30
Karma: 90402
Join Date: Feb 2019
Device: Tolino Shine 3
QuickDicBuilder: Custom dictionaries on the Tolino

Dictionaries used by the Tolino app are stored under .tolino/dictionaries/ on the user data partition. The format used is that of QuickDic (*.quickdic).

Existing Dictionaries

The original QuickDic was an Android app written by Thad Hughes and eventually open-sourced. Dictionary files were hosted on Google Code and available for download but all of them got deleted and were apparently lost when Google shut down the website. A Web Archive snapshot of the project repository is available but files cannot be downloaded this way.

The project was later resurrected as QuickDic Restored by Reimar Döffinger. The author's repository contains a lot of dictionaries generated from Wiktionary, a sister project of Wikipedia, which was also the source of the original QuickDic data. However, as part of his work on the app, the author improved the dictionary format, which means that newer dictionaries (v007 instead of v006) are no longer compatible with the Tolino.

These Wiktionary-based dictionaries can be downloaded on GitHub:Make sure to download the files labeled v006 only.

Creating Dictionaries: The Tool

DictionaryPC is a Java tool for generating QuickDic dictionaries accompanying the QuickDic app:GitHub user Gitsaibot authored shell scripts for generating QuickDic dictionaries specifically with the Tolino in mind (the .jar file here is exactly the same as in the original project):Since it is a Java application, it needs JRE to run (portable version). Further, it requires the following classes: Common Compress, Common Lang3, International Components for Unicode, Xerces-J Impl.

For convenience, I packaged everything necessary to run it in a Windows environment into a single archive, which I named QuickDicBuilder. Here's how to use it:
  • Download and unpack: QuickDicBuilder.zip
  • Edit QuickDicBuilder.cmd and set JAVA_EXE to point to the Java binary on your system.
  • QuickDicBuilder can now be called just like any other command-line utility.
Note: Thad Hughes are Reimar Döffinger are the original authors, I am only redistributing this. For source code, please refer to the GitHub links above.

Creating Dictionaries: How to Use It

The dictionary generation tool is functional but not very well documented. Some extra information how it is supposed to be used can be obtained by reading old, closed GitHub issues and its source code.

The utility supports several input formats: "Wiktionary", "tab_separated", and "Chemnitz". The latter format follows that of several German dictionaries available here. Tab-separated is the most straightforward format to use. Perhaps it's best to illustrate how to use it by example.

Case #1: Dict.cc

Dict.cc dictionaries can be downloaded (for personal use) from:
https://www1.dict.cc/translation_file_request.php

I downloaded their Russian-English dictionary, and converted it to QuickDic format with the following command:

QuickDicBuilder --dictInfo="Dict.cc Russian-English" --dictOut="RU-EN_DictCC.quickdic" --input1="dictcc.ru-en.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="dictcc" --lang1="RU" --lang1Stoplist="StopLists\xx.txt" --lang2="EN"

I did not have a Russian stoplist so I used an empty one. Stoplists include frequently-appearing words that should be dropped from index. It'd probably be better to use one.

This conversion is relatively easy because the format of the downloaded file follows what the utility expects as its "tab_separated" input.

Case #2: CC-CEDICT

CC-CEDICT is a Chinese-English dictionary that can be downloaded from:
https://www.mdbg.net/chinese/dictionary?page=cc-cedict

Here, the conversion command was:

QuickDicBuilder --dictInfo="CC-CEDICT Chinese-English" --dictOut="CC-CEDICT.quickdic" --input1="cedict_ts.txt" --input1Charset=UTF8 --input1Format=tab_separated --input1Name="cc-cedict" --lang1="ZH" --lang1Stoplist="StopLists\xx.txt" --lang2="EN" --lang1Stoplist="StopLists\en.txt"

However, the input data needed to be rearranged first from:
SimplifiedHeadword TraditionalHeadword [Pronunciation] Definition
to:
SimplifiedHeadword TraditionalHeadword<Tab>Definition /Pronunciation/

For this purpose I used the following regular expression with sed:

sed -e "s/^ *\([^ ]*\) \([^ ]*\) *\[ *\(.*\) *\] *\/ *\(.*\) *\/.*$/\1 \2\t\4 \/\3\//g" cedict_ts.u8 > cedict_ts.txt

Results

This was done quickly just to check if it works but if you want to, you can download the dictionary files I generated.
Peripathetic is offline   Reply With Quote
Old 02-11-2020, 09:42 AM   #2
Morioh
Member
Morioh began at the beginning.
 
Posts: 13
Karma: 10
Join Date: May 2018
Device: Tolino shine
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
Morioh is offline   Reply With Quote
Old 02-25-2020, 08:06 AM   #3
Peripathetic
Enthusiast
Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.Peripathetic composes epic poetry in binary.
 
Posts: 30
Karma: 90402
Join Date: Feb 2019
Device: Tolino Shine 3
Quote:
Originally Posted by Morioh View Post
This looks really cool sadly i don't have the technical expertise to create ja-en dictionary from Jmdic
JMDict is an XML file you'd have to parse. This would be an extra step.

But it seems the same data is also available as a "legacy" EDICT download:
http://ftp.monash.edu/pub/nihongo/edict.zip

The EDICT version is a plain-text, JIS-encoded text file. So all you'd have to do is convert it to UTF8, and then you can transform it with regular expressions like I did with sed for the CC-CEDICT.
Peripathetic is offline   Reply With Quote
Old 02-25-2020, 10:18 AM   #4
Morioh
Member
Morioh began at the beginning.
 
Posts: 13
Karma: 10
Join Date: May 2018
Device: Tolino shine
Thank you for the mention but i should have said that i'm next to completely code illiterate .
So this is a pretty cool tool but i cannot use it.
Though i'm quite happy that even tolino has a dedicated way to make custom dictionaries since someone can get a bit of fun and usage out of this.
P.S Actually my toline is not even capable of selecting the text so its pointless ^^

Last edited by Morioh; 02-25-2020 at 10:54 AM.
Morioh is offline   Reply With Quote
Reply

Tags
dictionarypc, quickdic, tolino

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I use custom fonts in Tolino Shine 2 HD? swapjim Tolino 9 06-24-2018 12:52 PM
New custom English dictionaries ShellShock Kobo Developer's Corner 165 08-13-2016 08:53 AM
Keeping custom dictionaries after syncing? Ceiyne Kobo Reader 3 05-06-2015 01:29 AM
Best way to add custom dictionaries to iOS? avid01 Apple Devices 2 02-26-2015 01:29 PM
Are there ereaders that allow custom dictionaries? blu- Which one should I buy? 4 03-20-2014 06:10 PM


All times are GMT -4. The time now is 11:30 PM.


MobileRead.com is a privately owned, operated and funded community.