New dictionary format of firmware 2.14 - Page 8

gouni · 11-24-2012, 02:51 PM

I am doing a pose.

the character encoding French bug

I have trouble with the é è in my test with UTF-8

tshering · 11-24-2012, 03:16 PM

Quote:

Originally Posted by gouni

I am doing a pose.

the character encoding French bug

I have trouble with the é è in my test with UTF-8

You are right. After hard work it is best to have a relaxing break. After you have recovered, make sure that the last line in the index file is followed by LF. If there are problems with the character encoding please give more details.

gouni · 11-24-2012, 03:34 PM

I dont undersand :

my index.txt

allocation = 1
allocution = 2
allodial = 0
allographe = 3
allogène = -1 with marisa lookup

tshering · 11-24-2012, 04:06 PM

I am not sure, I guess your index file is ok. The problem might be caused by the encoding of the windows terminal. Type

Code:

 marisa-predictive-search words

At the marisa-lookup prompt type

Code:

and hit Enter. All five words should then be listed.

gouni · 11-24-2012, 04:36 PM

yeeeeeees 1000 thank tshering.

It's the windows console that is the problem.

Well, I have my five words with just my è with one another sign.

Just a question my index is in utf-8 BOM

my dictionary will also be in utf-8 or utf - 8 BOM?

gouni · 11-24-2012, 04:41 PM

He began to do later, I'll rest, I am happy

and thank you for your help.

Tomorrow I'll move forward and make the files aa ab ac ad ae... zz for the moment I do sleep.

tshering · 11-24-2012, 05:29 PM

I hope you had a refreshing sleep!

As for the format of the index, it is UTF-8 without BOM. In case of the html files, you can use both, but I would stick to UTF-8 without BOM.

Before you start making the files aa ab and so on, try to make a small dictionary with just one html in order to check whether it is working. Make sure that you have an epub with the corresponding words so that you can check the dictionary function easily.

gouni · 11-25-2012, 07:45 AM

My dictionary test works very well.

100 thank you all

This time I kept original html tags <g> <i> separators and sign . It is very beautiful as a presentation.

I still need you to notepad +++
.how select the line 868442 to 920568 to copy paste?

I go with the function to a line xxxxx

The completed dictionary will be to you.

tshering · 11-25-2012, 10:56 AM

Quote:

Originally Posted by gouni

It is very beautiful as a presentation.

I am glad for you!

Quote:

Originally Posted by gouni

I still need you to notepad +++
.how select the line 868442 to 920568 to copy paste?

I downloaded notepad++ yesterday for the first time. Therefore I do not no much about it.

You could try something different. Save the following code as dictlines.bat into the folder where your text file is.

Code:

@echo off
if [%1] == [] goto usage
if [%2] == [] goto usage
if [%3] == [] goto usage
setlocal EnableDelayedExpansion
set /a counter=0
for /f ^"usebackq^ eol^=^

^ delims^=^" %%a in (%3) do (
        if "!counter!" GTR "%2" goto :eof
        if "!counter!" GEQ "%1" echo %%a
        set /a counter+=1
)
goto :eof
:usage
echo Usage: dictlines.bat FROM_LINE TO_LINE INPUT_FILENAME > 

RESULT_FILENAME

At the command prompt write for instance:

Code:

dictlines 1 4 "mydic.txt" > ét.txt

and hit ENTER. This will write the lines 1 to 4 into the file ét.txt.

But producing the whole dictionary in this way is too much manual work and time consuming. I was hoping ShellShock will help with a piece of C code at this point.

tshering · 11-25-2012, 11:11 AM

Replace

Code:

INPUT_FILENAME >

by

Code:

INPUT_FILENAME ^>

to make the Usage part working

gouni · 11-25-2012, 12:06 PM

Thanks I'll try to digest all this.

gouni · 11-29-2012, 11:03 AM

Hello

file : index

marisa-build -owords index.txt

give : words

I want to do the reverse-path

I have a word file, I want to find index file

It is possible the words file by marisa having the index?

tshering · 11-29-2012, 02:08 PM

Quote:

Originally Posted by gouni

Hello

file : index

marisa-build -owords index.txt

give : words

I want to do the reverse-path

I have a word file, I want to find index file

It is possible the words file by marisa having the index?

Unfortunately the marisa tools do not offer a possibility to dump the marisa-file as a whole. With marisa-reverse-lookup you can retrieve the content line by line by typing the line number at the marisa prompt. This, of course, is time consuming in case of large files.
One thing you can do is the following. Type

Code:

marisa-predictive-search -n0 words > index.txt

Then type at the marisa prompt:

Code:

a ENTER
b ENTER
c ENTER

and so on.
This will output all entries starting with a,b,c and so on with additional information to index.txt. You can easily edit the file and remove the additional information. Be aware that marisa is case sensitive.

gouni · 11-30-2012, 01:18 PM

Thank's Tshering

Papi · 12-01-2012, 02:44 AM

Very interesting thread. I'd like to build my own dictionaries too, but with commercial dictionaries as sources. Back when I was in the kindle world, I bought some mobipocket dictionaries that I'd love to be able to use on the Kobo. Anyone tried something like that ? Also in French we have something very annoying for dictionaries, and it's indeed not handled by the French Larousse dictionary found in the Kobo : s', l', m', t' that can precedes a verb or a noun. For example, abris -》l'abris. If I put l'abris as a variant of abris, as far as I understood, it won't work (as it didn't with go/went). Maybe a file l'.html would work (as shown in the o'clock example), but it would contain a lot if words, basically all the nouns and verbs starting with a vowel.

11-24-2012, 04:06 PM	#109
tshering Wizard Posts: 3,489 Karma: 2914715 Join Date: Jun 2012 Device: kobo touch	I am not sure, I guess your index file is ok. The problem might be caused by the encoding of the windows terminal. Type Code: marisa-predictive-search words At the marisa-lookup prompt type Code: a and hit Enter. All five words should then be listed.

11-25-2012, 11:11 AM	#115
tshering Wizard Posts: 3,489 Karma: 2914715 Join Date: Jun 2012 Device: kobo touch	Replace Code: INPUT_FILENAME > by Code: INPUT_FILENAME ^> to make the Usage part working

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What's file format of dictionary	mnjkl	Kobo Reader	2	12-12-2011 08:48 AM
Dictionary format	jgray	Sony Reader	1	10-25-2010 09:52 AM
English Thesaurus in the dictionary format	osnova	Amazon Kindle	14	12-12-2009 06:42 PM
Dictionary: what version? can it be in firmware?	jedix	Sony Reader Dev Corner	7	12-05-2008 12:00 PM
Webster dictionary in DEPReader format	abigail	Reading and Management	0	08-10-2005 08:00 AM

11-24-2012, 02:51 PM	#106
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	I am doing a pose. the character encoding French bug I have trouble with the é è in my test with UTF-8

11-24-2012, 03:34 PM	#108
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	I dont undersand : my index.txt allocation = 1 allocution = 2 allodial = 0 allographe = 3 allogène = -1 with marisa lookup

11-24-2012, 04:36 PM	#110
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	yeeeeeees 1000 thank tshering. It's the windows console that is the problem. Well, I have my five words with just my è with one another sign. Just a question my index is in utf-8 BOM my dictionary will also be in utf-8 or utf - 8 BOM?

11-24-2012, 04:41 PM	#111
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	He began to do later, I'll rest, I am happy and thank you for your help. Tomorrow I'll move forward and make the files aa ab ac ad ae... zz for the moment I do sleep.

11-24-2012, 05:29 PM	#112
tshering Wizard Posts: 3,489 Karma: 2914715 Join Date: Jun 2012 Device: kobo touch	I hope you had a refreshing sleep! As for the format of the index, it is UTF-8 without BOM. In case of the html files, you can use both, but I would stick to UTF-8 without BOM. Before you start making the files aa ab and so on, try to make a small dictionary with just one html in order to check whether it is working. Make sure that you have an epub with the corresponding words so that you can check the dictionary function easily.

11-25-2012, 07:45 AM	#113
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	My dictionary test works very well. 100 thank you all This time I kept original html tags <g> <i> separators and sign . It is very beautiful as a presentation. I still need you to notepad +++ .how select the line 868442 to 920568 to copy paste? I go with the function to a line xxxxx The completed dictionary will be to you.

11-25-2012, 12:06 PM	#116
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	Thanks I'll try to digest all this.

11-29-2012, 11:03 AM	#117
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	Hello file : index marisa-build -owords index.txt give : words I want to do the reverse-path I have a word file, I want to find index file It is possible the words file by marisa having the index?

11-30-2012, 01:18 PM	#119
gouni Connoisseur Posts: 86 Karma: 546021 Join Date: Nov 2012 Device: kobo	Thank's Tshering

12-01-2012, 02:44 AM	#120
Papi Addict Posts: 311 Karma: 547600 Join Date: Jul 2010 Location: Paris Device: Kindle Keyboard, Kindle NT, PRS-650	Very interesting thread. I'd like to build my own dictionaries too, but with commercial dictionaries as sources. Back when I was in the kindle world, I bought some mobipocket dictionaries that I'd love to be able to use on the Kobo. Anyone tried something like that ? Also in French we have something very annoying for dictionaries, and it's indeed not handled by the French Larousse dictionary found in the Kobo : s', l', m', t' that can precedes a verb or a noun. For example, abris -》l'abris. If I put l'abris as a variant of abris, as far as I understood, it won't work (as it didn't with go/went). Maybe a file l'.html would work (as shown in the o'clock example), but it would contain a lot if words, basically all the nouns and verbs starting with a vowel.