![]() |
#106 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
I am doing a pose.
the character encoding French bug I have trouble with the é è in my test with UTF-8 |
![]() |
![]() |
![]() |
#107 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
You are right. After hard work it is best to have a relaxing break. After you have recovered, make sure that the last line in the index file is followed by LF. If there are problems with the character encoding please give more details.
|
![]() |
![]() |
![]() |
#108 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
![]() my index.txt allocation = 1 allocution = 2 allodial = 0 allographe = 3 allogène = -1 with marisa lookup |
![]() |
![]() |
![]() |
#109 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I am not sure, I guess your index file is ok. The problem might be caused by the encoding of the windows terminal. Type
Code:
marisa-predictive-search words Code:
a |
![]() |
![]() |
![]() |
#110 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
yeeeeeees 1000 thank tshering.
![]() It's the windows console that is the problem. Well, I have my five words with just my è with one another sign. Just a question my index is in utf-8 BOM my dictionary will also be in utf-8 or utf - 8 BOM? |
![]() |
![]() |
![]() |
#111 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
He began to do later, I'll rest, I am happy
![]() ![]() Tomorrow I'll move forward and make the files aa ab ac ad ae... zz for the moment I do sleep. ![]() |
![]() |
![]() |
![]() |
#112 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I hope you had a refreshing sleep!
As for the format of the index, it is UTF-8 without BOM. In case of the html files, you can use both, but I would stick to UTF-8 without BOM. Before you start making the files aa ab and so on, try to make a small dictionary with just one html in order to check whether it is working. Make sure that you have an epub with the corresponding words so that you can check the dictionary function easily. |
![]() |
![]() |
![]() |
#113 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
My dictionary test works very well.
![]() ![]() This time I kept original html tags <g> <i> separators and sign . It is very beautiful as a presentation. I still need you to notepad +++ .how select the line 868442 to 920568 to copy paste? I go with the function to a line xxxxx The completed dictionary will be to you. ![]() |
![]() |
![]() |
![]() |
#114 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I am glad for you!
Quote:
You could try something different. Save the following code as dictlines.bat into the folder where your text file is. Code:
@echo off if [%1] == [] goto usage if [%2] == [] goto usage if [%3] == [] goto usage setlocal EnableDelayedExpansion set /a counter=0 for /f ^"usebackq^ eol^=^ ^ delims^=^" %%a in (%3) do ( if "!counter!" GTR "%2" goto :eof if "!counter!" GEQ "%1" echo %%a set /a counter+=1 ) goto :eof :usage echo Usage: dictlines.bat FROM_LINE TO_LINE INPUT_FILENAME > RESULT_FILENAME Code:
dictlines 1 4 "mydic.txt" > ét.txt But producing the whole dictionary in this way is too much manual work and time consuming. I was hoping ShellShock will help with a piece of C code at this point. Last edited by tshering; 11-25-2012 at 11:02 AM. |
|
![]() |
![]() |
![]() |
#115 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Replace
Code:
INPUT_FILENAME > Code:
INPUT_FILENAME ^> |
![]() |
![]() |
![]() |
#116 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
Thanks I'll try to digest all this.
![]() |
![]() |
![]() |
![]() |
#117 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
Hello
file : index marisa-build -owords index.txt give : words I want to do the reverse-path ![]() I have a word file, I want to find index file It is possible the words file by marisa having the index? |
![]() |
![]() |
![]() |
#118 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Quote:
One thing you can do is the following. Type Code:
marisa-predictive-search -n0 words > index.txt Code:
a ENTER b ENTER c ENTER This will output all entries starting with a,b,c and so on with additional information to index.txt. You can easily edit the file and remove the additional information. Be aware that marisa is case sensitive. |
|
![]() |
![]() |
![]() |
#119 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
Thank's Tshering
|
![]() |
![]() |
![]() |
#120 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 311
Karma: 547600
Join Date: Jul 2010
Location: Paris
Device: Kindle Keyboard, Kindle NT, PRS-650
|
Very interesting thread. I'd like to build my own dictionaries too, but with commercial dictionaries as sources. Back when I was in the kindle world, I bought some mobipocket dictionaries that I'd love to be able to use on the Kobo. Anyone tried something like that ? Also in French we have something very annoying for dictionaries, and it's indeed not handled by the French Larousse dictionary found in the Kobo : s', l', m', t' that can precedes a verb or a noun. For example, abris -》l'abris. If I put l'abris as a variant of abris, as far as I understood, it won't work (as it didn't with go/went). Maybe a file l'.html would work (as shown in the o'clock example), but it would contain a lot if words, basically all the nouns and verbs starting with a vowel.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What's file format of dictionary | mnjkl | Kobo Reader | 2 | 12-12-2011 08:48 AM |
Dictionary format | jgray | Sony Reader | 1 | 10-25-2010 09:52 AM |
English Thesaurus in the dictionary format | osnova | Amazon Kindle | 14 | 12-12-2009 06:42 PM |
Dictionary: what version? can it be in firmware? | jedix | Sony Reader Dev Corner | 7 | 12-05-2008 12:00 PM |
Webster dictionary in DEPReader format | abigail | Reading and Management | 0 | 08-10-2005 08:00 AM |