|02-02-2014, 07:09 PM||#1|
Join Date: Feb 2014
DE-PL Kobo Dictionary
As the German-German kobo dictionary sucks compared to what you can get for free (and even compared to German-English kobo dictionary... meh), I decided to make my own.
I decided not to crawl on pons.eu due to lack of time (though it's feasible), but used instead word database free to download from http://www.depl.pl/ .
Resulting dictionary in attachment.
What I needed to do besides what tutorials tell to do:
1) this particular word database has strange format - it's like TAB but uses double space instead (sed 's/ /\t/' a.utf8 > depl.tab)
stardict-editor created stardict from TAB as soon as I removed repetitions (program verbosely complained what am I missing)
2) 'python2.7 penelope.py -f de -t pl -p depl --output-kobo' prepared *.html files and applied marisa to them with success, then failed to create 'words' and zipped with wrong filename encoding
3) 'cut -f1 depl.tab > words' cured the first problem
4) I needed to pack the files using Windows version of 7zip (wine'd), as Linux 7z, p7zip and zip failed to make the shitty encoding (i.e. encoded filenames in UTF-8, instead of whatever ancient code page zip uses) please read below
The dictionary has been installed over some italian-english one which I doubt to ever use.
I tried to insert in sqlite database the de-pl one. 'Manage dictionaries' shows correctly the dict, but the reader must have the list built-in somewhere in libnickel1.0.0 (don't have toolchain to play with it, but with binary editor you can see hard-coded list of dictionaries...)
(As for the license - the webpage states that you can copy at will unless you alter the software or want any cash: http://www.depl.pl/licencja.html . I assume this does not cover changing file format, especially as the website offers the dictionary in Kindle format for free)
I got confuzed by the encoding....
Windows version of 7zip created an archive I could list and exctact so that all ÷ and ▀ were intact. Linux zip and (p)7z(ip) created an archive that when listed or extracted replaced all national characters with garbage (eg. z÷.html → zĂÂ.html).
But Kobo did not recognize the correctly encoded zip (i.e. the one that extracted fine), but accepted the shitty one. Results: with windows 7z unm÷glich was un.html and definition was displayed, but m÷glich was in m÷.html and was not displayed.
What I find strange is that the dicts shipped with Kobo work correctly both on device and linux/windows. To check if encoding is fine in kobo, you can probably telnet and unzip -l (at least this worked in my case). I've replaced the dict in attachment with a corrected one...
tl;dr: "zip dicthtml-de-pl.zip *" works in Kobo even if ń÷Ř▀ are garbage on PC
Last edited by User_Name; 02-05-2014 at 08:06 AM. Reason: dict update - I mixed up encodings prev.
|03-16-2014, 09:14 AM||#2|
Join Date: Mar 2014
Thank you for the dictionary. It is great! Do you know any websites where I could learn how to create a new Kobo dictionary? I would need a sort of step-by-step instructions to create a spanish - english dictionary.
I would appreciate your help.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Kobo Dictionary||PeterT||Kobo Reader||2||05-07-2012 07:19 AM|
|Install another dictionary to the Kobo Touch?||Harry_W||Kobo Reader||20||12-11-2011 06:17 PM|
|Touch highlight/dictionary on kobo touch only for kobo books||wes101||Kobo Reader||37||07-28-2011 10:38 PM|
|Touch Can't do Highlight/ Dictionary on Kobo book.||Joywalker||Kobo Reader||8||07-07-2011 01:00 AM|
|How do I get the dictionary on my original Kobo||phoenixo||Kobo Reader||4||03-19-2011 09:39 PM|