Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-20-2015, 02:37 PM   #151
Zylbath
Junior Member
Zylbath began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2015
Device: Kindle
Thank you for your answer.
Really? I tried to export it from excel with tab delimited. Strange.

I added the .zip with the four excel files. The most important one would be the Greenlandic_English one. Could you, or someone, convert that into a tab .txt format and I can try again to convert it into a useable dictionary? Hopefully it'll work then. (I know the file has some entries that won't be readable because they are inconsistenly formatted, but that's fine. At least some are useable.)

The problem though is that Mobireader Creator doesn't have the language code for Greenlandic (Kalaallisut). I honestly don't even wonder. I always took English (Canada) (It's the nearest to Greenland ^^).

I really appreciate your help. Thanks a lot!
Attached Files
File Type: zip Greenlandic.zip (2.48 MB, 1075 views)
Zylbath is offline   Reply With Quote
Old 12-20-2015, 02:56 PM   #152
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Converted your txt file to utf8 and applied tab2opf with option -utf.
See attached.
Attached Files
File Type: zip Greenlandic-English.zip (1.72 MB, 582 views)
EbokJunkie is offline   Reply With Quote
Advert
Old 12-20-2015, 03:21 PM   #153
Zylbath
Junior Member
Zylbath began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2015
Device: Kindle
Hey, thank you, thank you, thank you!
It works, I am fascinated. Of course, morphological complexe words often can't be find (Greenlandic is a polysynthetic language where not rarely 12 morphemes with objects, attributes etc. are combined in one word. I guess a dictionary couldn't master that.) But the shorter words can now be searched. I really thank you!

I wish you all Merry Christmas.
Greetings,
Kevin
Zylbath is offline   Reply With Quote
Old 12-20-2015, 04:39 PM   #154
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
BTW, Greenlandic language code is kal or kl.
See here.
You can edit opf file and recreate mobi.
EbokJunkie is offline   Reply With Quote
Old 12-20-2015, 06:20 PM   #155
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Zylbath View Post
Hey, thank you, thank you, thank you!
It works, I am fascinated. Of course, morphological complexe words often can't be find (Greenlandic is a polysynthetic language where not rarely 12 morphemes with objects, attributes etc. are combined in one word.
I'm not familliar with Greenlandic morphology, but if it has somewhat predictable patterns, you might be able to add inflections.
E.g. if an hypothetical Greenlandic word ABCD will often occur with a prefix aaa and/or a suffix bbb, you could add them as inflections (aaaABCD, ABCDbbb, aaaABCDbbb etc.).
For more information on inflections, see the Kindle Publishing Guidelines and the Mobipocket website.

Quote:
Originally Posted by EbokJunkie View Post
BTW, Greenlandic language code is kal or kl.
See here.
You can edit opf file and recreate mobi.
Unfortunately, the language codes in kindlegen/mobigen are hardcoded, and if you use a language code that is not in the following list, you'll get an error message.

Spoiler:
af Afrikaans
sq Albanian
ar Arabic
ar-dz Arabic (Algeria)
ar-bh Arabic (Bahrain)
ar-eg Arabic (Egypt)
ar-eg Arabic (Iraq)
ar-jo Arabic (Jordan)
ar-kw Arabic (Kuwait)
ar-lb Arabic (Lebanon)
ar-lb Arabic (Libya)
ar-ma Arabic (Morocco)
ar-om Arabic (Oman)
ar-qa Arabic (Qatar)
ar-sa Arabic (Saudi Arabia)
ar-sy Arabic (Syria)
ar-tn Arabic (Tunisia)
ar-ae Arabic (U.A.E.)
ar-ye Arabic (Yemen)
hy Armenian
az Azeri (Cyrillic)
az Azeri (Latin)
eu Basque
be Belorussian
bn Bengali
bg Bulgarian
ca Catalan
zh Chinese
zh-hk Chinese (Hong Kong)
zh-cn Chinese (PRC)
zh-sg Chinese (Singapore)
zh-tw Chinese (Taiwan)
hr Croatian
cs Czech
da Danish
nl Dutch
nl-be Dutch (Belgium)
en English
en-au English (Australia)
en-bz English (Belize)
en-ca English (Canada)
en-ie English (Ireland)
en-jm English (Jamaica)
en-nz English (New Zealand)
en-ph English (Philippines)
en-za English (South Africa)
en-tt English (Trinidad)
en-gb English (United Kingdom)
en-us English (United States)
en-zw English (Zimbabwe)
et Estonian
fo Faeroese
fa Farsi
fi Finnish
fr-be French (Belgium)
fr-ca French (Canada)
fr French
fr-lu French (Luxembourg)
fr-mc French (Monaco)
fr-ch French (Switzerland)
ka Georgian
de German
de-at German (Austria)
de-li German (Liechtenstein)
de-lu German (Luxembourg)
de-ch German (Switzerland)
el Greek
gu Gujarati
he Hebrew
hi Hindi
hu Hungarian
is Icelandic
id Indonesian
it Italian
it-ch Italian (Switzerland)
ja Japanese
kn Kannada
kk Kazakh
x-kok Konkani
ko Korean
lv Latvian
lt Lithuanian
mk Macedonian
ms Malay (Brunei Darussalam)
ms Malay (Malaysia)
ml Malayalam
mt Maltese
mr Marathi
ne Nepali
no Norwegian (Bokmal)
no Norwegian (Nynorsk)
or Oriya
pl Polish
pt Portuguese
pt-br Portuguese (Brazil)
pa Punjabi
rm Rhaeto-Romanic
ro Romanian
ro-mo Romanian (Moldova)
ru Russian
ru-mo Russian (Moldova)
sz Sami (Lappish)
sa Sanskrit
sr Serbian (Cyrillic)
sr Serbian (Latin)
sk Slovak
sl Slovenian
sb Sorbian
es Spanish
es-ar Spanish (Argentina)
es-bo Spanish (Bolivia)
es-cl Spanish (Chile)
es-co Spanish (Colombia)
es-cr Spanish (Costa Rica)
es-do Spanish (Dominican Republic)
es-ec Spanish (Ecuador)
es-sv Spanish (El Salvador)
es-gt Spanish (Guatemala)
es-hn Spanish (Honduras)
es-mx Spanish (Mexico)
es-ni Spanish (Nicaragua)
es-pa Spanish (Panama)
es-py Spanish (Paraguay)
es-pe Spanish (Peru)
es-pr Spanish (Puerto Rico)
es-uy Spanish (Uruguay)
es-ve Spanish (Venezuela)
sx Sutu
sw Swahili
sv Swedish
sv-fi Swedish (Finland)
ta Tamil
tt Tatar
te Telugu
th Thai
ts Tsonga
tn Tswana
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek (Cyrillic)
uz Uzbek (Latin)
vi Vietnamese
xh Xhosa
zu Zulu
Doitsu is offline   Reply With Quote
Advert
Old 12-20-2015, 06:38 PM   #156
Zylbath
Junior Member
Zylbath began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2015
Device: Kindle
Quote:
BTW, Greenlandic language code is kal or kl.
See here.
You can edit opf file and recreate mobi.
Yeah, I tried it with kl only and it resulted in an error. But with kal it worked now, thanks.

Quote:
I'm not familliar with Greenlandic morphology, but if it has somewhat predictable patterns, you might be able to add inflections.
E.g. if an hypothetical Greenlandic word ABCD will often occur with a prefix aaa and/or a suffix bbb, you could add them as inflections (aaaABCD, ABCDbbb, aaaABCDbbb etc.).
For more information on inflections, see the Kindle Publishing Guidelines and the Mobipocket website.
That would be a lifework. Greenlandic only has suffixes, but therefore 1,500 of them, plus the 4,000 stems they can be attached to. And the other things is that many suffixes swallow the consonants before them or they assimilate to just one lengthened consonant, so the word gets unrecognisable changed. Well, not totally, otherwise the language couldn't be understood. But I guess it would be a lifework to do such a dictionary. The dictionary you converted for me already has thousands of inflected forms. But it is just by chance that some words you want to look up really match these, there are just too many combinations with 1,500 suffixes and words that not rarely have 12 morphemes or more. But shorter words are recognised and the dictionary is actually pretty useful. So, thanks again!
Zylbath is offline   Reply With Quote
Old 12-20-2015, 07:11 PM   #157
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Zylbath View Post
Yeah, I tried it with kl only and it resulted in an error. But with kal it worked now, thanks.
Actually, kal did not work. KindleGen/Mobigen will ignore the third letter and treats kal like ka = Georgian.

Quote:
Originally Posted by Zylbath View Post
That would be a lifework. Greenlandic only has suffixes, but therefore 1,500 of them, plus the 4,000 stems they can be attached to.
Google Open Source Greenlandic POS Tagger Stemmer. Maybe one of the projects has a list containing the headword and all possible inflected forms, because such a list is required for some of the simpler stemmers to reduce inflected forms to headwords. (Usually FST stemmers don't come with human-readable word databases.)

Quote:
Originally Posted by Zylbath View Post
So, thanks again!
I didn't do anything. EbokJunkie converted your file for you.
Doitsu is offline   Reply With Quote
Old 12-20-2015, 07:37 PM   #158
Zylbath
Junior Member
Zylbath began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Dec 2015
Device: Kindle
It did not work? Damn. But then it has be Georgian, nobody reads in Georgian. =P

There was once a page where the words were analysed automatically and put into a morphological tree. But I can't remember the site, all I know is this: http://giellatekno.uit.no/cgi/d-kal.eng.html It can disambiguate or hyphenate the input. But I don't know whether they have a database that could be used, most of their functionality doesn't work anymore, only some analysis tools are available anymore.

But I can't really find something by the key words you gave. There is something like that for Inuktitut, closely related to Greenlandic, but every morpheme had to be changed because of sound changes.
Zylbath is offline   Reply With Quote
Old 01-09-2017, 12:55 PM   #159
Teom@n
Enthusiast
Teom@n began at the beginning.
 
Posts: 47
Karma: 10
Join Date: Dec 2014
Location: Lyon
Device: Kindle PW3, Kobo Libra H2O
tab2opf seperated my txt file to 4 pieces of html. but it doesn't use the UTF-8. I have a problem with special letters. How could I solve this problem?

I tried different methods (calibre, etc) for creating html to use mobigen/kindlegen but this time my kindle doesn't see my dict as dict.
Teom@n is offline   Reply With Quote
Old 01-09-2017, 01:28 PM   #160
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Teom@n View Post
tab2opf seperated my txt file to 4 pieces of html.
That's the normal behavior. It'll split up large files into several smaller files.

Quote:
Originally Posted by Teom@n View Post
[...] but it doesn't use the UTF-8. I have a problem with special letters. How could I solve this problem?
You'll need to specify the -utf8 command line parameter:

Code:
python tab2opf.py -utf8 dict.txt
Quote:
Originally Posted by Teom@n View Post
I tried different methods (calibre, etc) for creating html to use mobigen/kindlegen but this time my kindle doesn't see my dict as dict.
You can't generate dictionaries with Calibre, you'll need to use Kindlegen. The source files also need to be formatted according to the Kindle Publishing Guidelines. I.e., you'll have to post-edit the files generated by tab2opf.py.

Out of curiosity: what are the input and output languages?
Doitsu is offline   Reply With Quote
Old 01-09-2017, 01:34 PM   #161
Teom@n
Enthusiast
Teom@n began at the beginning.
 
Posts: 47
Karma: 10
Join Date: Dec 2014
Location: Lyon
Device: Kindle PW3, Kobo Libra H2O
Quote:
Originally Posted by Doitsu View Post
That's the normal behavior. It'll split up large files into several smaller files.



You'll need to specify the -utf8 command line parameter:

Code:
python tab2opf.py -utf8 dict.txt

You can't generate dictionaries with Calibre, you'll need to use Kindlegen. The source files also need to be formatted according to the Kindle Publishing Guidelines. I.e., you'll have to post-edit the files generated by tab2opf.py.

Out of curiosity: what are the input and output languages?
Firstly, Thanks.

Could you explain "how should I use pyhton code?

Last edited by Teom@n; 01-09-2017 at 02:28 PM.
Teom@n is offline   Reply With Quote
Old 01-09-2017, 01:39 PM   #162
Teom@n
Enthusiast
Teom@n began at the beginning.
 
Posts: 47
Karma: 10
Join Date: Dec 2014
Location: Lyon
Device: Kindle PW3, Kobo Libra H2O
Quote:
Originally Posted by Doitsu View Post
That's the normal behavior. It'll split up large files into several smaller files.



You'll need to specify the -utf8 command line parameter:

Code:
python tab2opf.py -utf8 dict.txt

You can't generate dictionaries with Calibre, you'll need to use Kindlegen. The source files also need to be formatted according to the Kindle Publishing Guidelines. I.e., you'll have to post-edit the files generated by tab2opf.py.

Out of curiosity: what are the input and output languages?

I'm trying to make a French-Turkish dictionary. My tools are:

-tab2opf.exe
-mobigen.exe

-kindlegen
-notepad++

python 3 is intalled on my pc.

edit:
I installed python 2.7 and converted my dict. It works. Now I have to solve hypenation. How should I edit and optimize my database for the best search result? I have a problem with plural words as "inédit, inédite, inédites" and some cells contains a few words as "lequel, laquelle, lesquels, lesquelles ".

My databes is like:

aboyant, ~e*
aboyer*
aboyeur, ~euse*
abracadabrant, ~e*
abraser*
abrasif, ~ive*
abreuvement, abreuvage*
abréviatif, ~ive*
abricoté, ~e*
abricotier*
.
.
.
lequel, laquelle, lesquels, lesquelles

Last edited by Teom@n; 01-09-2017 at 03:20 PM.
Teom@n is offline   Reply With Quote
Old 01-10-2017, 04:46 AM   #163
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Teom@n View Post
I installed python 2.7 and converted my dict. It works. Now I have to solve hypenation. How should I edit and optimize my database for the best search result? I have a problem with plural words as "inédit, inédite, inédites" and some cells contains a few words as "lequel, laquelle, lesquels, lesquelles ".
You'll need to explicitly define them as inflections. E.g.

Code:
<html>
<body>

<idx:entry>
	<b><idx:orth>cheval
	<idx:infl>
		<idx:iform value="chevaux"/>
	</idx:infl>
	</idx:orth> </b> 
	beygir 
</idx:entry>

<hr/>

<idx:entry>
	<b><idx:orth>feu
	<idx:infl>
		<idx:iform value="feux"/>
	</idx:infl>
	</idx:orth> </b> 
	ateş, yangın
</idx:entry>

</body>
</html>
For French verbs you might find the dsl2mobi French verb inflection list helpful.
Doitsu is offline   Reply With Quote
Old 01-10-2017, 08:40 AM   #164
Teom@n
Enthusiast
Teom@n began at the beginning.
 
Posts: 47
Karma: 10
Join Date: Dec 2014
Location: Lyon
Device: Kindle PW3, Kobo Libra H2O
Quote:
Originally Posted by Doitsu View Post
You'll need to explicitly define them as inflections. E.g.

Code:
<html>
<body>

<idx:entry>
	<b><idx:orth>cheval
	<idx:infl>
		<idx:iform value="chevaux"/>
	</idx:infl>
	</idx:orth> </b> 
	beygir 
</idx:entry>

<hr/>

<idx:entry>
	<b><idx:orth>feu
	<idx:infl>
		<idx:iform value="feux"/>
	</idx:infl>
	</idx:orth> </b> 
	ateş, yangın
</idx:entry>

</body>
</html>
For French verbs you might find the dsl2mobi French verb inflection list helpful.
I have official latest ODS7 and I converted it to Excel. (415.000 words)
I will try to fill it with Turkish explanations than convert stardict-text-html-mobi.
What do you think? Could it be better?
Teom@n is offline   Reply With Quote
Old 01-10-2017, 11:23 AM   #165
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Teom@n View Post
I have official latest ODS7 and I converted it to Excel. (415.000 words)
I will try to fill it with Turkish explanations than convert stardict-text-html-mobi.
What do you think? Could it be better?
That depends on your technical skills. If your MS Excel spreadsheet contains only two columns you could save it as a tab-delimited text file and process it with tab2opf.py.
If you're familiar with regular expressions, you could also convert the tab-delimited text file directly using a couple of regular expressions.
Doitsu is offline   Reply With Quote
Reply

Tags
ebook tools, kindle tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dictionary lookup in iBooks 1.1: "Dictionary not available for this language" kjk Apple Devices 71 09-18-2010 06:24 AM
best foreign language & dictionary options? joedevivre Which one should I buy? 2 12-13-2009 09:40 AM
How do I create headword-enabled Mobipocket dictionary? owl123 Kindle Formats 1 07-24-2009 11:13 AM
Useful tip: How to change the BD language AFTER you create a book HarryT Workshop 4 04-15-2009 12:36 AM
creating a foreign language dictionary dirtylc Amazon Kindle 1 03-30-2009 08:40 AM


All times are GMT -4. The time now is 07:54 PM.


MobileRead.com is a privately owned, operated and funded community.