Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-19-2021, 05:48 AM   #1
MrBeef12
Junior Member
MrBeef12 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2021
Device: Kindle Paperwhite 7th Generation
Creating a Kindle dictionary with inflections

Hello everyone!

I am currently trying to create a Russian dictionary that has proper inflections sourced from Wiktionary because all current other dictionaries are not that great.

There is some documentation about this provided by Amazon. It basically says that you should:

1) Create an XHTML file with their special markup specifying all inflections etc.
2) Turn it into an epub
3) Open it with Kindle Previewer
4) Export it with Kindle Previewer to MOBI

So I created a large XHTML file (23 MB or so) according to the Amazon specifications and opened it in Kindle Previewer, and it looked fine. However, Kindle Previewer does not let you export XHTML files to MOBI. They want you to create an intermediate epub file.

I tried using Pandoc to do the conversion, which did not work because it stripped out all the specific HTML tags and only left in paragraphs. Then I tried using calibre. The normal XHTML -> epub conversion failed because the XHTML file was too large, according to an error message. Calibre suggests to turn on the "heuristic mode" if you run into this error, which I tried, but which did not finish running after hours of runtime.

Then I attempted to create the epub file myself, using a sample file taken from this tutorial. I discovered that this is not trivial, and a check using epubcheck revealed many hard-to-understand errors in my generated file. The generation of the epub file is also a bit complicated by the fact that you probably need to split the XHTML files into many smaller files, which should maybe be 250 kb in size, because e-readers tend to struggle with parsing larger files.

So I thought there should maybe be an easier way to do this, or maybe a library that helps doing this. Maybe it would even be a good idea to output the words + inflections into some other easier dictionary format and then convert it to a MOBI using an existing library and leaving out the XHTML generation completely. Currently I am using Python, but I'd also use other languages if it is necessary. What could I try?

There is an apparently closed source script here that unfortunately doesn't support inflections, so does not work. And there are instructions here that advise converting the file to PRC using Mobipocket Creator and then opening it with Kindle Previewer. The problem with this approach is that Kindle Previewer throws the error:

> Kindle Previewer does not support this file, which has either been created using an older version of KindleGen or a third party application. We recommend using EPUB or DOCX format directly for previewing and publishing your book on Kindle.

There are also more detailed instructions for Mobipocket Creator here, which tell you to directly move the generated .prc file onto the kindle. I tried that but it is not being recognized as a dictionary.
MrBeef12 is offline   Reply With Quote
Old 11-20-2021, 11:26 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,300
Karma: 19777777
Join Date: Dec 2010
Device: Kindle PW2
Unless you're trying to create a Russian dictionary with a target language other than English, you might want to consider registering your Kindle and downloading the Russian Abbyy dictionaries that all owners of registered Kindles can download for free.

The following dictionaries are available for download:

ABBYY Lingvo Большой Русско-Английский Словарь (Russian->English)
ABBYY Lingvo Большой Англо-Русский Словарь (English->Russian)
ABBYY Lingvo Большой Толковый Словарь Русского Языка (Russian<->Russian)

If you really want to create a dictionary from scratch, see this post for a proof-of-concept dictionary.
For more information also see the How to create your own mobipocket dictionary for any language thread.

FYI: Kindlegen is included in Kindle Previewer:
Windows: %LOCALAPPDATA%\Amazon\Kindle Previewer 3\lib\fc\bin\kindlegen.exe
macOS: /Applications/Kindle Previewer 3.app/Contents/lib/fc/bin/kindlegen

Last edited by Doitsu; 11-20-2021 at 11:41 AM.
Doitsu is offline   Reply With Quote
Old 11-21-2021, 12:59 PM   #3
MrBeef12
Junior Member
MrBeef12 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2021
Device: Kindle Paperwhite 7th Generation
Thank you very much for your help, Doitsu! I had downloaded ABBYY Monolingual dictionary (don't know how I got it), but can't find a way to get the the English -> Russian Version. I always used the free Smirnitsky one that is legally downloadable.

But it is not that important, because I am really looking to create my own one. The main reason is that I created an open source tool that adds stress to all Russian words in an ebook (where the stress doesn't depend on context) . This breaks almost all lookups when using the traditional dictionaries. Additionally I made some steps to gather data from Wiktionary and OpenRussian which contain lots of words and OpenRussian has some German translations which could also be useful.

Your sample file was super helpful! Using this and your linked instructions I managed to create a prototype dictionary that is supposed to work for stressed texts. And it works... a bit, maybe 30% of the lookups that it is supposed to know actually work, and there does not seem to be any pattern to this. For example (and all words appear in the xhtml files):

working:
роди́телей
забежа́ла
что́бы
себя́
штра́фа
необходи́мо

not working:
тюрьму́
заплати́ть
приняла́сь
тече́ние
вы́бралась
су́нул

I will have to gather more data on this, it is all very strange. I have uploaded the epub zip file if someone feels inclined to throw a look, and I will post an update once I figure out the mistake. It is really random, I made code that removed some duplicates, and some words that worked before suddenly stopped working (which were totally unrelated to the duplicate removal).
Attached Files
File Type: zip russian_dict_template.zip (3.20 MB, 63 views)
MrBeef12 is offline   Reply With Quote
Old 12-10-2021, 08:12 AM   #4
MrBeef12
Junior Member
MrBeef12 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2021
Device: Kindle Paperwhite 7th Generation
After a lot of (unsuccessful) trying to create the correct HTML code, I found the Pyglossary library to generate the MOBI for me (using following code). And it still did not work, with exactly the same error pattern, despite being a totally independent implementation.

My current guess is that this is caused by a bug in Kindle's word lookup algorithm. I can't prove it of course, but I did spend quite some time on creating a Spanish dictionary with the same code, and it works flawlessly!

I thank everyone who downloaded the files trying to find the error! I created a repository with my code for people who are interested. And for people who might be learning Spanish I can share my Spanish-English Kindle dictionary, which might be the absolute best Spanish-English dictionary in terms of word coverage and amount of translations and proper word linkages/inflections. (I'll only try to find a workaround for some bugs in the kindle word lookup algorithm before declaring it as completely finished).
MrBeef12 is offline   Reply With Quote
Reply

Tags
dictionary, inflections, kindle

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dictionary converter and irregular inflections ninpuukamui PocketBook 8 03-15-2020 11:42 AM
Inflections issue in my custom dictionary 3QVKwyNT9hR849t7 Amazon Kindle 7 07-20-2018 04:00 AM
Creating dictionary for kindle 4 Koko Amazon Kindle 5 03-01-2013 07:26 AM
French Dictionary inflections Yashwanth.P Introduce Yourself 6 03-09-2012 01:13 PM


All times are GMT -4. The time now is 02:05 AM.


MobileRead.com is a privately owned, operated and funded community.