Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 12-17-2013, 02:35 PM   #1
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
Need help about dictionary

Sorry if this is wrong place to ask and hope moderators will place where it belongs.

I need to create serbian-serbian dictionary. I read all post here for creating dictionary and I'm a little confused. I hope someone will help. I have a book, a dictionary I want to scan. After scaning and OCR i will have file with some name filename.docx That file will have over 40 000 words.
After editing (find and fix all mistakes from ocr) i save as plain text (utf-8). That is file with dictionary.txt extension WHAT NEXT!!! WHAT NEXT!!!! If i understand, from that dictionary.txt i have to create tab-delimited file. OR that dictionary.txt is tab-delimited file that i will have to run with python script that require python 2.7 (not 2.7.2). After that i will get dictionary.opf
After runing mobigen and that dictionary.opf I will recive dictionary.mobi wich will be final product.
1)is the procedure "OK"?
2)what is tab-delimited file?
3)is stardict format actualy online format of all kinds of dictionary that i need to convert to dictionary.txt? and then use proceder wich i ask
4)what about dictionary.prc? I see some are creating dictionary.prc (not mobi extension) what is different betvene them, and does kindle device read bout of them same
5) MOUST IMPORTANT QUESTION
5) Does Inflections code must be on every word (more then 40 000), or it must be writen at begining of dictionary.txt? or at the end of dictionary.txt
Is there any script to install all inflections?

THANKS AND SORRY FOR BAD ENGLISH
Difermo is offline   Reply With Quote
Old 12-17-2013, 02:43 PM   #2
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
You have to know the inflections to code them. For a mobipocket dictionary, you have to follow the documentation given here: http://www.mobipocket.com/dev/articl...e=indexing.htm. It requires a lot of extra HTML coding. A .prc file should be about the same as a .mobi file, usually. .mobi would be preferred. I don't know if a Kindle, for example, would allow a dictionary with a .prc suffix. The dictionary would not be tab-delimited, but all in HTML.

Tab-delimited format is where each field is separated by tabs. Sometimes the text fields are quoted, in addition. There can be issues if the text field has newlines or tabs in it. Not sure what the rules are for that. Google is your friend. Maybe you use backslash quoting for things like that, like \t instead of tab, \n instead of newline, etc., depending on what is going to be reading the file.
susan_cassidy is offline   Reply With Quote
Advert
Old 12-17-2013, 03:21 PM   #3
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
hmm i read that site but still dont understend. does every word must be in <idxrth></idxrth> code or it must be done separatly. For example here are 3 words. How should they be writen sou that dictionary and inflection work on kindle. I will plase "-" as new tab and new word

"-"suptraktivan (nlat. subtilitas) koji se moze odbiti ili odracunati
"-"suptrakcija (nlat. subtractio) mat. odvijanje, oduzimanje
"-"suptrahend (lat. subtrahendus) broj koji treba oduzeti od drugog broja
Difermo is offline   Reply With Quote
Old 12-17-2013, 03:34 PM   #4
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Yes, every word must be done separately with the idx tags. And, the inflections, if any, have to be done the way the example on the web page I referenced works. I don't get your fascination with tabs, there are no tabs used in HTML.
susan_cassidy is offline   Reply With Quote
Old 12-17-2013, 03:42 PM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by susan_cassidy View Post
A .prc file should be about the same as a .mobi file, usually. .mobi would be preferred.
That's not the case for dictionaries, since Amazon hasn't ported the Kindle dictionary format to KF8. I.e., in terms of functionality it doesn't make a difference whether dictionaries are generated with the older mobigen.exe or the current kindlegen.exe. (Kindle Dictionaries cannot be generated with Calibre.)

Quote:
Originally Posted by susan_cassidy View Post
I don't know if a Kindle, for example, would allow a dictionary with a .prc suffix.
In that case you may want to refrain from answering dictionary related questions. BTW, the answer is yes, Kindles and Kindle apps accept both .prc and .mobi dictionaries.

Quote:
Originally Posted by susan_cassidy View Post
The dictionary would not be tab-delimited, but all in HTML.
Kindles and Kindle apps do not support uncompiled HTML dictionaries; they obviously need to be compiled.

Quote:
Originally Posted by Difermo View Post
After editing (find and fix all mistakes from ocr) i save as plain text (utf-8). That is file with dictionary.txt extension WHAT NEXT!!! WHAT NEXT!!!!
That depends on your technical skills. If you know your way around a Unicode editor that supports regular expressions, you could:

A) Use lots of search and replace operations (or a custom script) to add the required Mobipocket dictionary tags, manually create an .opf file with file references and open it with MobiGen.exe or KindleGen.exe to generate the dictionary.

Each entry in the HTML source file should look more or less like this (I added spaces to make the code more readable; they're optional):

Code:
<html>
<body>

<idx:entry>
	<b><idx:orth>book
	<idx:infl>
		<idx:iform value="books"/>
	</idx:infl>
	</idx:orth> </b> 
	<i>noun</i> <br/>
	a written or printed work
</idx:entry>
<br/><br/>
<hr/>
<idx:entry>
	<b><idx:orth>go
	<idx:infl>
		<idx:iform value="goes"/>
		<idx:iform value="going"/>
		<idx:iform value="went"/>
		<idx:iform value="gone"/>
	</idx:infl>
	</idx:orth> </b> 
	<i>verb</i> <br/>
	move from one place or point to another
</idx:entry>
<br/><br/>

</body>
</html>
B) Use lots of search and replace operations to change the text file in a way that you end up with a UTF8 text file that contains lines with the following format:

Code:
Headword<TAB>Definition<CR/LF>
<TAB> stands for the invisible tab character and <CR/LF> for the line-break that you create when you press Enter. (Unix-style line-breaks are OK, too.)
Once you've done that you can either use tab2opf.py to generate source files required for MobiGen/KindleGen. Alternatively, you could also use files provided on this website.

Quote:
Originally Posted by Difermo View Post
3)is stardict format actualy online format of all kinds of dictionary that i need to convert to dictionary.txt?
The StarDict format is a completely different format, however, if you happen to find a Serbian-Serbian StarDict or Babylon BGL dictionary, you could use another tool, PyGlossary, to convert it to a tab-delimited file that you could use as input file for tab2opf.py.

Quote:
Originally Posted by Difermo View Post
4)what about dictionary.prc? I see some are creating dictionary.prc (not mobi extension) what is different betvene them, and does kindle device read bout of them same
PRC dictionaries were created with the older Mobigen.exe and MOBI dictionaries with KindleGen.exe otherwise they're more or less identical.

Quote:
Originally Posted by Difermo View Post
5) Does Inflections code must be on every word (more then 40 000), or it must be writen at begining of dictionary.txt? or at the end of dictionary.txt Is there any script to install all inflections?
Unfortunately, tab2opf.py does not support inflections, you'll need to find someone who can write a custom script for you that adds them to each entry in the required format. In my example dictionary code, inflections are coded in the <idx:infl>...</idx:infl> block.

You may also want to read the Kindle Dictionary FAQ, which will answer many of your questions.
Doitsu is offline   Reply With Quote
Advert
Old 12-17-2013, 03:52 PM   #6
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
new tab = new line (enter button)
Sou where do i place inflections? Can you (or anybody else) write does 3 words how soud they be writen? SUPTRA sould be inflection
suptraktivan,suptrakcija,suptrahend are the word.
(nlat. subtilitas), (nlat. subtractio), (lat. subtrahendus) are latin meaning
and the rest is word discrabe
Difermo is offline   Reply With Quote
Old 12-17-2013, 04:01 PM   #7
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
@Doitsu

thx. since i didnt refresh page i didnt notice you place a coment (i was writing new one). It understand almost everything. just must ask something about inflections.

Sou when creating opf file from dictionary.txt (without inflections), i can latter opet in and insert all inflections manual? the code you show is like one i practice in sigil!! Do i write it there?
Difermo is offline   Reply With Quote
Old 12-17-2013, 04:20 PM   #8
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Difermo View Post
new tab = new line (enter button)
Sou where do i place inflections? Can you (or anybody else) write does 3 words how soud they be writen? SUPTRA sould be inflection
suptraktivan,suptrakcija,suptrahend are the word.
(nlat. subtilitas), (nlat. subtractio), (lat. subtrahendus) are latin meaning and the rest is word discrabe
Usually, the word inflection is used for inflected word forms. For example, the English verb to go has 4 inflectons: goes, going, gone and went.

It looks like what you're looking for are derived words. If that's what you mean, simply wrap them in <idx:orth>...</idx:orth> tags, too. (Each <idx:entry>...</idx:entry> can contain multiple <idx:orth>...</idx:orth> entries.)

Unfortunately, tab2opf.py cannot handle this, you'll have to manually edit the .html files that it creates to group related dictionary entries under the headword that they're derived from.

Quote:
Originally Posted by Difermo View Post
Sou when creating opf file from dictionary.txt (without inflections), i can latter opet in and insert all inflections manual? the code you show is like one i practice in sigil!! Do i write it there?
Sigil is an ePub editor and not suitable for generating Kindle dictionary source files. However, there are lots of free Unicode text editors that also offer HTML syntax highlighting, for example, Notepad++.
Doitsu is offline   Reply With Quote
Old 12-17-2013, 04:33 PM   #9
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Obviously, I was talking about HTML as source for the dictionary. It would have to be compiled into a usable format.
susan_cassidy is offline   Reply With Quote
Old 12-17-2013, 04:42 PM   #10
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
understand inflections just need to understand editing and i'm readi to go
Quote:
Originally Posted by Doitsu View Post
Unfortunately, tab2opf.py cannot handle this, you'll have to manually edit the .html files that it creates to group related dictionary entries under the headword that they're derived from.

Sigil is an ePub editor and not suitable for generating Kindle dictionary source files. However, there are lots of free Unicode text editors that also offer HTML syntax highlighting, for example, Notepad++.
Isn't epub ziped HTML files? can i copy all from dictionary.txt to sigil and save to epub and then convert to mobi? if not i need notepad++ to open dictionary.opf?
Difermo is offline   Reply With Quote
Old 12-17-2013, 04:57 PM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Difermo View Post
Isn't epub ziped HTML files?
That's correct, but Kindle dictionary source files contain non-standard tags that Sigil cannot handle, e.g. <idx:entry>. Since Sigil usually deletes all non-standard tags, you'll lose a lot of work, if you open the .html files generated by tab2opf.py with it.

Quote:
Originally Posted by Difermo View Post
if not i need notepad++ to open dictionary.opf?
dictionary.opf is an XML project file that can be edited with any text editor.
Doitsu is offline   Reply With Quote
Old 12-17-2013, 05:31 PM   #12
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
sou after creating dictionary.opf from dictionary.txt, I can open dictionary.opf with notepad++ and edit all inlfections?
Difermo is offline   Reply With Quote
Old 12-17-2013, 05:32 PM   #13
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
The inflections don't go into the .opf file. That just contains configuration-type information.
susan_cassidy is offline   Reply With Quote
Old 12-17-2013, 05:46 PM   #14
Difermo
Member
Difermo began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
Quote:
Originally Posted by susan_cassidy View Post
The inflections don't go into the .opf file. That just contains configuration-type information.
where does it gou?
Difermo is offline   Reply With Quote
Old 12-17-2013, 07:09 PM   #15
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Into the .html file.
susan_cassidy is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
creating dictionary from pdf dictionary fiaz Workshop 0 05-15-2011 12:40 PM
Dictionary question: changing word delimiters for french dictionary oecherprinte Amazon Kindle 1 05-09-2011 04:45 AM
Dictionary lookup in iBooks 1.1: "Dictionary not available for this language" kjk Apple Devices 71 09-18-2010 06:24 AM
Oxford built-in dictionary disappears after changing default dictionary YYZscientist Amazon Kindle 4 01-24-2010 08:42 PM
Dictionary, Webster's: Webster's Dictionary 1913, v1, 22 Nov 07 jbenny Kindle Books 58 06-02-2009 06:14 AM


All times are GMT -4. The time now is 08:32 AM.


MobileRead.com is a privately owned, operated and funded community.