![]() |
#1 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
Need help about dictionary
Sorry if this is wrong place to ask and hope moderators will place where it belongs.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() I need to create serbian-serbian dictionary. I read all post here for creating dictionary and I'm a little confused. I hope someone will help. I have a book, a dictionary I want to scan. After scaning and OCR i will have file with some name filename.docx That file will have over 40 000 words. After editing (find and fix all mistakes from ocr) i save as plain text (utf-8). That is file with dictionary.txt extension WHAT NEXT!!! WHAT NEXT!!!! If i understand, from that dictionary.txt i have to create tab-delimited file. OR that dictionary.txt is tab-delimited file that i will have to run with python script that require python 2.7 (not 2.7.2). After that i will get dictionary.opf After runing mobigen and that dictionary.opf I will recive dictionary.mobi wich will be final product. 1)is the procedure "OK"? 2)what is tab-delimited file? 3)is stardict format actualy online format of all kinds of dictionary that i need to convert to dictionary.txt? and then use proceder wich i ask 4)what about dictionary.prc? I see some are creating dictionary.prc (not mobi extension) what is different betvene them, and does kindle device read bout of them same 5) MOUST IMPORTANT QUESTION 5) Does Inflections code must be on every word (more then 40 000), or it must be writen at begining of dictionary.txt? or at the end of dictionary.txt Is there any script to install all inflections? THANKS AND SORRY FOR BAD ENGLISH |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
You have to know the inflections to code them. For a mobipocket dictionary, you have to follow the documentation given here: http://www.mobipocket.com/dev/articl...e=indexing.htm. It requires a lot of extra HTML coding. A .prc file should be about the same as a .mobi file, usually. .mobi would be preferred. I don't know if a Kindle, for example, would allow a dictionary with a .prc suffix. The dictionary would not be tab-delimited, but all in HTML.
Tab-delimited format is where each field is separated by tabs. Sometimes the text fields are quoted, in addition. There can be issues if the text field has newlines or tabs in it. Not sure what the rules are for that. Google is your friend. Maybe you use backslash quoting for things like that, like \t instead of tab, \n instead of newline, etc., depending on what is going to be reading the file. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
hmm i read that site but still dont understend. does every word must be in <idx
![]() ![]() "-"suptraktivan (nlat. subtilitas) koji se moze odbiti ili odracunati "-"suptrakcija (nlat. subtractio) mat. odvijanje, oduzimanje "-"suptrahend (lat. subtrahendus) broj koji treba oduzeti od drugog broja |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
Yes, every word must be done separately with the idx tags. And, the inflections, if any, have to be done the way the example on the web page I referenced works. I don't get your fascination with tabs, there are no tabs used in HTML.
|
![]() |
![]() |
![]() |
#5 | |||||||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
Quote:
Quote:
A) Use lots of search and replace operations (or a custom script) to add the required Mobipocket dictionary tags, manually create an .opf file with file references and open it with MobiGen.exe or KindleGen.exe to generate the dictionary. Each entry in the HTML source file should look more or less like this (I added spaces to make the code more readable; they're optional): Code:
<html> <body> <idx:entry> <b><idx:orth>book <idx:infl> <idx:iform value="books"/> </idx:infl> </idx:orth> </b> <i>noun</i> <br/> a written or printed work </idx:entry> <br/><br/> <hr/> <idx:entry> <b><idx:orth>go <idx:infl> <idx:iform value="goes"/> <idx:iform value="going"/> <idx:iform value="went"/> <idx:iform value="gone"/> </idx:infl> </idx:orth> </b> <i>verb</i> <br/> move from one place or point to another </idx:entry> <br/><br/> </body> </html> Code:
Headword<TAB>Definition<CR/LF> Once you've done that you can either use tab2opf.py to generate source files required for MobiGen/KindleGen. Alternatively, you could also use files provided on this website. Quote:
Quote:
Quote:
You may also want to read the Kindle Dictionary FAQ, which will answer many of your questions. |
|||||||
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
new tab = new line (enter button)
Sou where do i place inflections? Can you (or anybody else) write does 3 words how soud they be writen? SUPTRA sould be inflection suptraktivan,suptrakcija,suptrahend are the word. (nlat. subtilitas), (nlat. subtractio), (lat. subtrahendus) are latin meaning and the rest is word discrabe |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
@Doitsu
thx. since i didnt refresh page i didnt notice you place a coment (i was writing new one). It understand almost everything. just must ask something about inflections. Sou when creating opf file from dictionary.txt (without inflections), i can latter opet in and insert all inflections manual? the code you show is like one i practice in sigil!! Do i write it there? |
![]() |
![]() |
![]() |
#8 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
It looks like what you're looking for are derived words. If that's what you mean, simply wrap them in <idx:orth>...</idx:orth> tags, too. (Each <idx:entry>...</idx:entry> can contain multiple <idx:orth>...</idx:orth> entries.) Unfortunately, tab2opf.py cannot handle this, you'll have to manually edit the .html files that it creates to group related dictionary entries under the headword that they're derived from. Quote:
|
||
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
Obviously, I was talking about HTML as source for the dictionary. It would have to be compiled into a usable format.
|
![]() |
![]() |
![]() |
#10 | |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
understand inflections just need to understand editing and i'm readi to go
Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
That's correct, but Kindle dictionary source files contain non-standard tags that Sigil cannot handle, e.g. <idx:entry>. Since Sigil usually deletes all non-standard tags, you'll lose a lot of work, if you open the .html files generated by tab2opf.py with it.
dictionary.opf is an XML project file that can be edited with any text editor. |
![]() |
![]() |
![]() |
#12 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
sou after creating dictionary.opf from dictionary.txt, I can open dictionary.opf with notepad++ and edit all inlfections?
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
The inflections don't go into the .opf file. That just contains configuration-type information.
|
![]() |
![]() |
![]() |
#14 |
Member
![]() Posts: 20
Karma: 10
Join Date: Nov 2013
Device: none
|
|
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
Into the .html file.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
creating dictionary from pdf dictionary | fiaz | Workshop | 0 | 05-15-2011 12:40 PM |
Dictionary question: changing word delimiters for french dictionary | oecherprinte | Amazon Kindle | 1 | 05-09-2011 04:45 AM |
Dictionary lookup in iBooks 1.1: "Dictionary not available for this language" | kjk | Apple Devices | 71 | 09-18-2010 06:24 AM |
Oxford built-in dictionary disappears after changing default dictionary | YYZscientist | Amazon Kindle | 4 | 01-24-2010 08:42 PM |
Dictionary, Webster's: Webster's Dictionary 1913, v1, 22 Nov 07 | jbenny | Kindle Books | 58 | 06-02-2009 06:14 AM |