![]() |
#151 |
Digital Amanuensis
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Doesn't the current code already do that? Other scripts will use characters with ord() > 127.
If ord(x) > 127, then character x is considered ok. In other words: right now, a keyword goes into 11.html if and only if it its 2-character prefix contains an ascii char x with ord(x) <= 127 which is not a letter. And I have tested that, in this case, the keyword is correctly retrieved. Am I missing something? |
![]() |
![]() |
![]() |
#152 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I did not look at your code. I was only guessing your code would implement the idea
Quote:
Last edited by tshering; 12-09-2012 at 11:38 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#153 |
Digital Amanuensis
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Ah, ok. No, the code is supposed to do what you asked for.
1) If the keyword is only 1 character long, the code appends an "a" to it. 2) Then it looks at the first two characters of the keyword. If both are (an ASCII letter or have ord() > 127), then the keyword goes to a suitable ??.html file. Otherwise, it goes to 11.html. There might be still a small "slack", for example in the mentioned case of "o'clock". But I tested that putting "o'clock" into 11.html works, so I guess the current code is fine. |
![]() |
![]() |
![]() |
#154 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
I would like to mention that entries with identical keywords (and different definitions) have to follow directly on each other. If not, the Kobo displays only the first occurrence. For "economy" reasons, I did not check for this with my Japanese-English dictionary. Therefore, I have to do it anew.
Last edited by tshering; 12-09-2012 at 11:43 AM. |
![]() |
![]() |
![]() |
#155 |
Digital Amanuensis
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Penelope sorts the keys (i.e., keywords) before processing and outputting them, so the code should be fine w.r.t. this issue.
Also, in the current version I wrapped the content of a word+definition into a <div>, so that two identical keywords with different definitions will be cleanly displayed one below the other. |
![]() |
![]() |
Advert | |
|
![]() |
#156 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Today I run into a problem. I made a French dictionary. To my disappointment, no word with an accented vowel in the first two letters did show up. The reason for this was evidently the encoding of the filenames inside the zip file. This was a surprise for me, since my Japanese dictionaries work perfectly, so why should not the French dictionary? Anyhow, I tried zipping the file under Ubuntu, rather than under Windows, and the dictionary was working fine.
The culprit is a strange default behavior of 7-zip under Windows (or maybe I should rather say the culprit was me not being aware of it). 7-zip encodes only those filenames in utf-8 that contain characters not supported by the local codepage. Therefore, one has to enforce utf-8 encoding, under Windows. Code:
7z a -tzip -mcu=on dicthtml.zip *.html words |
![]() |
![]() |
![]() |
#157 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
Tshering
I give you my mini table of letter for french Open with notepad++ http://www.mediafire.com/?ptmn10cplqr7kaa Last edited by gouni; 12-12-2012 at 12:12 PM. |
![]() |
![]() |
![]() |
#158 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
I need your help to use 7zip
I use the windows console i have 300 fichiers xx.html xx= ab ac ad ae I want to compress each into gzip The name of my *.html is in ansi code windows and i will transform to utf-8 to have my 300 files xx.html.gz and have accented-utf8 characters What command should I use? Thank aa.html = aa.html.gz . aé.html = aé.html.gz . ûf.html = ûf.html.gz . . zz.html = zz.html.gz Last edited by gouni; 12-12-2012 at 01:41 PM. |
![]() |
![]() |
![]() |
#159 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,178
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
|
Quote:
|
|
![]() |
![]() |
![]() |
#160 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Gouni,
there is no need to convert the filenames already when creating the gz-files. Do Code:
for %i in (*.html) do 7z a -tgzip "gz\%i" "%i" Code:
7z a -tzip -mcu=on dicthtml.zip *.html words |
![]() |
![]() |
![]() |
#161 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Quote:
|
|
![]() |
![]() |
![]() |
#162 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
![]() Thank you Tshering, yes is a part of description of my problem. I do not know it but I listen for the Council. I've done with notepad + my text is in UTF8, but I saved my files in utf8 names. I work with windows, I don't know unix, I do not know programming. All my file names éa éb ...... ûm html are windows character. I read ![]() ![]() Last edited by gouni; 12-12-2012 at 09:18 PM. |
![]() |
![]() |
![]() |
#164 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
|
Thanks for giving your time tshering. It is very difficult, never I did that. I get to the second command (a day to understand but I get)
I know do this : ![]() 7z a -tzip -mcu=on e:\zzzarch e:\zztest\*.html give me : all files in zztest go corectly in file zzzarch.zip but I do not succeed the first command ![]() My organisation folder in e: e:\zztest = my folder with all files *.html 7z" for %i in (*.html) do 7z a -tgzip" "gz\%i" "%i" (*.html) = ![]() Last edited by gouni; 12-13-2012 at 07:34 AM. |
![]() |
![]() |
![]() |
#165 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Do not replace "*.html" by "e:\zztest\*.htm". Instead change to this directory by executing
Code:
cd e:\zztest Code:
for %i in (*.html) do 7z a -tgzip "gz\%i" "%i" If you are not already on e: then execute first Code:
e: Last edited by tshering; 12-13-2012 at 07:40 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What's file format of dictionary | mnjkl | Kobo Reader | 2 | 12-12-2011 08:48 AM |
Dictionary format | jgray | Sony Reader | 1 | 10-25-2010 09:52 AM |
English Thesaurus in the dictionary format | osnova | Amazon Kindle | 14 | 12-12-2009 06:42 PM |
Dictionary: what version? can it be in firmware? | jedix | Sony Reader Dev Corner | 7 | 12-05-2008 12:00 PM |
Webster dictionary in DEPReader format | abigail | Reading and Management | 0 | 08-10-2005 08:00 AM |