Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > PocketBook > PocketBook Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 11-27-2024, 08:30 AM   #286
stopchan
Member
stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.
 
Posts: 14
Karma: 464
Join Date: Nov 2024
Device: PocketBook 700 Era 16gb
Can somebody convert that en-uk dictionaries (the same dictionaries in different formats) to *.dic format for me?
https://github.com/bakustarver/ukr-d...8%20(1907-1909)

P.S. It's quite difficult for me to convert because of my limited knowlage about how to work in command line.
stopchan is offline   Reply With Quote
Old 12-07-2024, 08:44 AM   #287
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Code:
...
Total number of articles processed $ar = 78703.
Done at 20241207 14:39:30
Removing ' '. This will take some time. 20241207 14:39:31
Result convertNumberedSequencesToChar: 27-> ''' (x13501)
length html before removeInvalidChars is 47970949
Removing invalid characters. This will take some time. 20241207 14:39:31
Done at 20241207 14:39:31
Converting <blockquote-tags to <div style:"margin 0 0 0 1em;">-tags. This will take some time. 20241207 14:39:32
Running system command:"WINEDEBUG=-all wine converter.exe "/home/mark/Downloads/PocketbookDic/dict/en-ukr_Balla/eng-ukr_Balla_v1.3_reconstructed.xdxf" eng"
Loading collates...
Loading morphems...
Loading keyboard...
Loading dictionary file...
/home/mark/Downloads/PocketbookDic/dict/en-ukr_Balla/eng-ukr_Balla_v1.3_reconstructed.xdxf, line 282: unclosed xml tag
wine: Unhandled page fault on write access to 00000000 at address 00404403 (thread 0024), starting debugger...
WineDbg attached to pid 0020
Unhandled exception: page fault on write access to 0x00000000 in wow64 32-bit code (0x00404403).
Register dump:
...
Balla (en-ukr) doesn't convert without errors. Usually this is due to use of ">" or "<" in the text that is parsed as an unclosed xml-tag. I am not going to debug it for you. However, as the messages show, the faulty code is around line 282. Succes with bughunting.

Last edited by Markismus; 12-09-2024 at 06:46 AM.
Markismus is offline   Reply With Quote
Advert
Old 12-08-2024, 06:53 AM   #288
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Looking at the lines it seems to be issue with escape characters:
Code:
$ sed -n '275,285p' eng-ukr_Balla_v1.3_reconstructed.xdxf 
</ar>
<ar>
<head><k>A one</k></head><def><div style="margin-left:1em"><i class="p"><font color="green">adj</font></i&gt; <i class="p"><font color="green">амер.</font></i&gt;<i class="p"><font color="green">,</font></i&gt; <i class="p"><font color="green">розм.</font></i&gt;</div>
<div style="margin-left:1em">першокласний, відмінний</div></def>
</ar>
<ar>
<head><k>a posteriori</k></head><def><div style="margin-left:1em"><i class="p"><font color="green">лат.</font></i&gt;</div>
[m1]<font color="darkred"><b&gt;1.</b></font> <i class="p"><font color="green">adj</font></i&gt;
<div style="margin-left:1em">апостеріорний, заснований на досвіді</div>
[m1]<font color="darkred"><b&gt;2.</b></font> <i class="p"><font color="green">adv</font></i&gt;
<div style="margin-left:1em">апостеріорі, емпірично, з досвіду</div></def>
I've switched a few toggles that impact unescaping HTML-characters and the Koreader optimized version looks good. You'll have to test the dic-file yourself.

It's in the ENG-UKR directory on pCloud.

I've also changed the subroutine escapeHTMLStringForced to skip the contents of tags. Due to the 2-factor authentication on Github, I still have to figure out how to push the commits to the remote, though. (Changes pushed to github.)

The new code is:
Code:
our $PossibleTags = qr~/?(def|mbp|c>|c c="|abr>|ex>|kref>|k>|key|rref|f>|!--|!doctype|a|abbr|acronym|address|applet|area|article|aside|audio|b>|b |base|basefont|bb|bdo|big|blockquote|body|/?br|button|canvas|caption|center|cite|code|col|colgroup|command|datagrid|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|eventsource|fieldset|figcaption|figure|font|footer|form|frame|frameset|h[1-6]|head|header|hgroup|hr/|html|i>|i |iframe|img|input|ins|isindex|k|kbd|keygen|label|legend|li|link|map|mark|menu|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q>|rp|rt|ruby|s>|samp|script|section|select|small|source|span|strike|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u>|ul|var|video|wbr)~;
our $HTMLcodes = qr~(lt;|amp;|gt;|quot;|apos;|\#x?[0-9A-Fa-f]{1,6})~;
sub escapeHTMLString{
    my $String = shift;
    unless( $isEscapeHTMLCharacters ){ 
        info_t("returning without escaping '$String'");
        return $String; 
    }
    return( escapeHTMLStringForced($String) );}
sub escapeHTMLStringForced{
    my $String = shift;
    unless( defined $String ){ die2("Undefined string given to escapeHTMLString."); }
    
    # Turn string in array of tags and strings
    my @String;
    while( $String =~ s~^([^<>]*)(<[^<>]+>)~~s ){
        push @String, $1 if defined $1;
        push @String, $2;
    }
    foreach(@String){
    if( m~^<~ ){ next; }
    # Convert '<' to '&lt;', but not if it's part of a HTML tag.
    s~<(?!\/?$PossibleTags[^>]*>)~&lt;~gs;
    # Convert '>' to '&gt;', but not if it's part of a HTML tag.
    s~(?<!<$PossibleTags[^>]{0,100})>~&gt;~sg;
    # Convert '&' to '&amp', but not if is part of an HTML escape sequence.
    s~&(?!$HTMLcodes)~&amp;~gs;
    s~'~\&apos;~sg;
    s~"~\&quot;~sg;
    s~\{~\&$123;~sg;
    s~\?~\&$125;~sg;
    }
    $String = join( '', @String );
    info_t("returning after escaped '$String'");
    return $String;}

Last edited by Markismus; 12-28-2024 at 03:11 PM.
Markismus is offline   Reply With Quote
Old 12-10-2024, 09:36 AM   #289
stopchan
Member
stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.stopchan has a complete set of Star Wars action figures.
 
Posts: 14
Karma: 464
Join Date: Nov 2024
Device: PocketBook 700 Era 16gb
Quote:
Originally Posted by Markismus View Post
I've switched a few toggles that impact unescaping HTML-characters and the Koreader optimized version looks good. You'll have to test the dic-file yourself.

It's in the ENG-UKR directory on pCloud.
Thanks a lot, Markismus!!! File .dic works well.
stopchan is offline   Reply With Quote
Old 12-27-2024, 12:32 PM   #290
mujina
Junior Member
mujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-books
 
Posts: 5
Karma: 920
Join Date: Dec 2024
Device: PocketBook Verse
First of all: Markismus, you rule, you rock @ all other English idioms referring to an awesome person.
For the past week, since buying my PocketBook Verse, I've read pretty much everything I could find about PB dictionaries on MR and elsewhere. Took a lot of head-scratching and a full week, but thanks to you and all the other posters, a non-techie like me managed to defy staggering obstacles and compile their own .dic Japanese-English dictionary. Twice! I'm so grateful. No, really, I'm as happy as a kid on Christmas morning. I mean it.

Now for an issue, in case anyone feels like tinkering with it (or maybe it's the PB software itself?): when tapping to select the word to translate (PB doesn't seem to allow drag-to-select), the previous kana gets selected along with it, so the dictionary doesn't recognise it at first and I have to tap the search box, delete the first character and tap "search" again for the word to be found.
Example: in "木の枝葉", "the tree(木) 's(の) foliage(枝葉)". Tapping 枝葉 will select の枝葉. This never happens if the previous character is a kanji, or if the word is at the start of the sentence, or if there is a space before it (Japanese rarely uses spaces). BUT sometimes repeated tapping gets me the right selection.

Also, if this is normal behaviour and there's actually nothing wrong with my .dic, can I maybe upload it somewhere so other people can use it too?

Again, thank you millions. Learning Japanese is a passion, and e-books have changed my life. A "small" thing like having this dictionary on my PB means a lot to me.
mujina is offline   Reply With Quote
Advert
Old 12-28-2024, 11:56 AM   #291
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@mujina Thanks for your kind words. I regularly get messages that the traffic limit is reached for the dictionary download, but it nice to hear the script is appreciated directly, too.

Yes, it could have something to do with how you compiled your dictionary. The dictionary starts with a string given, which apparently includes your apostrophe. Looking at the jaK language files, I see beautiful Japanese characters in the keyboard.txt and morphems.txt files. However, the collates.txt file has no characters at all. Don't know what its source is. It's different from the one found in the JaR directory. (I've included the files in this post, in case you don't have them. Please put the txt-files of each zip-file in its own 3 letter directory, e.g. jar, jak. In this way you can tell the script and converter.exe to use them by telling it that the lang-from is jar or jak.)

A few posts back nhedgehog posted a convertion manual. It's in German, so if that's a problem, you'll have to pull it through Google translate.

It doesn't say much more that collates.txt is there for the conversion of characters and you should look at the difference between the collates files in the eng and rus directories for working examples. So you could try to use it to destroy the apostrophe. Normally it's done in the first line:
Code:
1234567890.,-_ '`!?:;*´"()「」。、[]<>{}/\*«»‘’‚‛“”„‟…=
It seems prudent to convert those symbols to Japanse characters and recompile the dictionary. It would replace a.o. the apostrophe with the symbol to the right of the equal (=) sign. In this case it's nothing and so it's destroyed.

You write that it happens whenever a previous symbol is not a kanji, a space or a end of sentence. However, I don't know whether this solution covers all instances as I don't read Japanese. Other simple solutions seem to be to use another font with another symbol width or to select a word from the most right character and moving left until the compound of characters are selected.

It you would be so kind to upload the tweaked languages files, than I could put them into the repository.

Last edited by Markismus; 12-28-2024 at 03:20 PM.
Markismus is offline   Reply With Quote
Old 12-31-2024, 04:13 AM   #292
mujina
Junior Member
mujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-books
 
Posts: 5
Karma: 920
Join Date: Dec 2024
Device: PocketBook Verse
Ah, my mistake. I didn't mention (forgot, in fact) that the collates.txt I used was from a different source. Sorry for sending you on the wrong track! Can't track down its origin anymore (would have liked to give credit), but I'm attaching it here. As you see, it has the "syllable characters" added at the end.
I must have phrased something wrong, because the apostrophe is not an issue - my bad for making it sound like that, English is not my first language.
BUT the key seems to lie in your last paragraph, about the symbol width. I think you found the source of the problem there. Upon closer observation, PB seems to always try to select three characters as a "word" to look up. No more, no less. With practice, I can tap in the right place (second or third character) to determine where the string begins. If my word consists of only the first character (as it often happens in Japanese), either the dictionary automatically drops the end characters, or I can edit the search and delete them. If it consists of more than three characters, the "edit search" button allows me to find it. Bottom line: the search is not perfect, but it works well enough, if you get used to it, within the limits of the three-character selection, which I guess is determined by PB's software itself.
Taking another look at the three "accessory" files (collates, keyboard, and morphems), it suddenly occurred to me: is keyboard.txt supposed to somehow influence the on-screen keyboard? Because I've never been able to type Japanese characters in the dictionary, all I get is the Latin ones. (Just idle curiosity, it's not that important.)
Despite the minor inconveniences, the dictionary works! Once again, I am very grateful. Have a happy New Year, you and everyone else here (assuming you celebrate it tonight), and I hope life gives you all the good things you deserve and more!
(I tried attaching the .dic itself, but it's seen as an "invalid file" for some reason.)
Attached Files
File Type: txt collates.txt (1.5 KB, 99 views)
mujina is offline   Reply With Quote
Old 01-03-2025, 07:23 AM   #293
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Quote:
Originally Posted by mujina View Post
Ah, my mistake. I didn't mention (forgot, in fact) that the collates.txt I used was from a different source. Sorry for sending you on the wrong track! Can't track down its origin anymore (would have liked to give credit), but I'm attaching it here. As you see, it has the "syllable characters" added at the end.
Thanks, I'll add it to the repository.
Quote:
I must have phrased something wrong, because the apostrophe is not an issue - my bad for making it sound like that, English is not my first language.
No, it was my bad. You did say in your example " 's ", but I got sidetracked looking through the document. Realized it later, but was doing something else at that time.
Quote:
BUT the key seems to lie in your last paragraph, about the symbol width. I think you found the source of the problem there. Upon closer observation, PB seems to always try to select three characters as a "word" to look up. No more, no less. With practice, I can tap in the right place (second or third character) to determine where the string begins. If my word consists of only the first character (as it often happens in Japanese), either the dictionary automatically drops the end characters, or I can edit the search and delete them. If it consists of more than three characters, the "edit search" button allows me to find it. Bottom line: the search is not perfect, but it works well enough, if you get used to it, within the limits of the three-character selection, which I guess is determined by PB's software itself.
No idea how to control the 3 character setting. It seems to be a reason to file a patch request at Pocketbook. It sounds like a dirty hack that fails them for Japanese.
Quote:
Taking another look at the three "accessory" files (collates, keyboard, and morphems), it suddenly occurred to me: is keyboard.txt supposed to somehow influence the on-screen keyboard? Because I've never been able to type Japanese characters in the dictionary, all I get is the Latin ones. (Just idle curiosity, it's not that important.)
Yes, that's what it's for. Otherwise you cannot input words to look up in your dictionary.
Quote:
Despite the minor inconveniences, the dictionary works! Once again, I am very grateful. Have a happy New Year, you and everyone else here (assuming you celebrate it tonight), and I hope life gives you all the good things you deserve and more!
You have a very lovely 2025, too!
Quote:
(I tried attaching the .dic itself, but it's seen as an "invalid file" for some reason.)
Zip the file, so that it is a zip-file. Mobileread accepts those. Otherwise you can post a link with we.transfer or another single file transfer service. However, it would even be nicer if you upload the used source file(s). Then I can convert them to Koreader, too. Would make the conversation a bit more interesting, too. As I don't have a Pocketbook anymore, all conversation about Japanese has been purely academic up to now.

Last edited by Markismus; 01-03-2025 at 08:45 AM.
Markismus is offline   Reply With Quote
Old 01-03-2025, 07:35 AM   #294
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Looking at the collates.txt file, I see a lot of replacements that don't seem to make sense.
Code:
「」。、.,-_ '`!?:;*´"()[]<>{}/\*«»‘’‚‛“”„‟…=
aÀÁÂÃÄÅÆĀĂĄǠǢǍǞǺǼȀȂȦàáâãäåæāăąǡǣǎǟǻǽȁȃȧ=A
bƀƁƂƃƄƅ=B
cÇĆĈĊČƆƇçćĉċčƈ=C
dĎĐƉƊƋďđƌƍ=D
eÈÉÊËĒĔĖĘĚƎƏƐȄȆȨèéêëēĕėęěȅȇȩ=E
fƑƒ=F
gĜĞĠĢƓǤǦǴĝğġģǥǧǵ=G
hĤĦǶȞĥħƕǷȟ=H
iÌÍÎÏĨĪĬĮİIJƖȈȊìíîïĩīĭįıijƗȉȋ=I
jĴĵ=J
kĶƘǨķĸƙǩ=K
lĹĻĽĿŁĺļľŀłƛƚ=L
mƜµ=M
nÑŃŅŇŊƝñńņňʼnŋƞ=N
oÒÓÔÕÖØŌŎŐŒƟƠƢǑǪǬǾȌȎȪȬȮȰòóôõöøōŏőœơƣǒǫǭǿȍȏȫȭȯȱ=O
pƤƥ=P
q=Q
rŔŖŘƦȐȒŕŗřȑȓ=R
sߌŜŞŠƧƩȘśŝşšƨƪș=S
tŢŤŦƬƮȚţťŧƫƭț=T
uÙÚÛÜŨŪŬŮŰŲƯƱǓǕǗǙǛȔȖùúûüũūŭůűųưƲǔǖǘǚǜȕȗ=U
v=V
wŴŵ=W
x=X
yÝŶŸƳȲýÿŷƴȳ=Y
zŹŻŽƵƷƹƻǮȤźżžƶƸƺǯȥ=Z
アァ=あ
イィ=い
ウゥ=う
エェ=え
オォ=お
カ=か
キ=き
ク=く
ケ=け
コ=こ
サ=さ
シ=し
ス=す
セ=せ
ソ=そ
タ=た
チ=ち
ツ=つ
テ=て
ト=と
ナ=な
ニ=に
ヌ=ぬ
ネ=ね
ノ=の
ハ=は
ヒ=ひ
フ=ふ
ヘ=へ
ホ=ほ
マ=ま
ミ=み
ム=む
メ=め
モ=も
ラ=ら
リ=り
ル=る
レ=れ
ロ=ろ
ワヮ=わ
ヤ=や
ユ=ゆ
ヨ=よ
ヲ=を
ャ=ゃ
ュ=ゅ
ョ=ょ
ガ=が
ギ=ぎ
グ=ぐ
ゲ=げ
ゴ=ご
ザ=ざ
ジ=じ
ズ=ず
ゼ=ぜ
ゾ=ぞ
ダ=だ
ヂ=ぢ
ヅ=づ
デ=で
ド=ど
バ=ば
ビ=び
ブ=ぶ
ベ=べ
ボ=ぼ
パ=ぱ
ピ=ぴ
プ=ぷ
ペ=ぺ
ポ=ぽ
ン=ん
ッ=っ
For instance there are `ヘ=へ` are 'ペ=ぺ' which replace a symbol with the same symbol?
And 'ジ=じ' and 'ヅ=づ' replace the same symbol with 2 different ones. (Probably only one is used.)
As they are just nice characters to me without any reference, I can't really say whether they make sense or improve your dictionary search. If your understanding of Japanese is good enough, would you be willing to look through it?

Looking at the keyboard.txt file for jaK, I see 4 pages with keyboard layouts. Lower-case roman, Upper-case roman and then two nearly identical pages with characters --small upper right strokes versus upper right circles-- separated with a new separator. The first is separated with a ' ~ - ' and 2 empty lines. The second also with a ' ~ - ' and no empty lines. Then a wholly new separator '--' between two pages with characters.
Code:
JP: Japan
           
           
qwertyuio
asdfghjkl
zxcvbnmp'
    ~ -    
           
           
QWERTYUIO
ASDFGHJKL
ZXCVBNMP'
    ~ -    
わ ら ま は な た さ か あ bs
や り み ひ に ち し き い cl
ゆ る む ふ ぬ つ す く う ん
よ れ め へ ね て せ け え sh
を ろ も ほ の と そ こ お ok
--
わ ら ぱ ば な だ ざ が あ bs
ゃ り ぴ び に ぢ じ ぎ い cl
ゅ る ぷ ぶ っ づ ず ぐ う ん
ょ れ ぺ べ ね で ぜ げ え sh
を ろ ぽ ぼ の ど ぞ ご お ok
If you can't see any Japanese characters I would start with changing all the separators to the ones used between the lower and upper case roman character keyboard layouts, recompile and see whether it has an impact.

Last edited by Markismus; 01-03-2025 at 07:37 AM.
Markismus is offline   Reply With Quote
Old 01-03-2025, 08:24 AM   #295
EastEriq
Groupie
EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.
 
Posts: 199
Karma: 195502
Join Date: Jan 2018
Device: Cybook Orizon, PocketBook Touch HD
Quote:
For instance there are `ヘ=へ` are 'ペ=ぺ' which replace a symbol with the same symbol?
They are katakana and hiragana for he, and pe. They make perfectly sense. You can see that they have different unicode points.
Quote:
And 'ジ=じ' and 'ヅ=づ' replace the same symbol with 2 different ones. (Probably only one is used.)
Ditto. Kata and hiragana for zi and du, two different syllabes. Compare the katakana ジ ヅ.

What may be discussible are all the exotic latin replacements. I can't imagine a Japanese reader running in romaji words with Ɯ, ǻ, Ƿ and the like....
EastEriq is offline   Reply With Quote
Old 01-03-2025, 08:48 AM   #296
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 942
Karma: 149883
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
Yes, nice! @EastEriq, could you correct the collates.txt file so that it improves upon the jaK file? It won't be usable for the JaR way of writing characters, right?
Markismus is offline   Reply With Quote
Old 01-05-2025, 05:32 AM   #297
mujina
Junior Member
mujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-books
 
Posts: 5
Karma: 920
Join Date: Dec 2024
Device: PocketBook Verse
First: I feel so happy that you guys are taking an interest in the subject! Hoping this can help more people down the line.
Also: It's exactly as EastEriq said about the similar-looking characters, that part is likely as it should be.
Quote:
What may be discussible are all the exotic latin replacements. I can't imagine a Japanese reader running in romaji words with Ɯ, ǻ, Ƿ and the like....
Quite right, I suppose they were trying to cover all the bases (or used a different language file as the starting point). At least (if I understand correctly), those extras will just never get used, but they won't impede the functionality.
Quote:
If you can't see any Japanese characters I would start with changing all the separators to the ones used between the lower and upper case roman character keyboard layouts, recompile and see whether it has an impact.
Thank you, tried it just now. Just to check if I got it right: I replaced both the second ~ - and the - - with ~ - followed by two blank lines. No change, though. So far, I've been making up for the lack of keyboard by just scrolling down through the suggested words.
But, since a native-character keyboard would definitely help the dictionary, I'm continuing the discussion here, though the keyboard part may be slightly off-topic.

I wonder (total ignorant here, so it may be a silly idea) if the keyboard doesn't work because the number of Japanese characters is higher than the number of keys on the standard QWERTY keyboard? Or (and i think this more likely) it's not be supposed to work that way, by simply mapping a character to a key.

This is what other virtual Japanese keyboards do: The characters you see in keyboard.txt are syllables - well, not quite, but it's the closest thing. か is ka, ね is ne and so on. To write か, you don't press a specific key; first you type a "k" and the keyboard writes in a k provisionally (or just waits). Then, if you type an "a", it writes a か (and suggests other characters that have the "ka" reading, like 家, 火, 歌...), or a く and the alternatives if you write a "u" after it, etc. etc.

I've tried the Chinese keyboard that comes pre-installed, and it works the same way Japanese ones do (I don't know Chinese, but the workings should be very similar). I've attached a picture, so you can see it takes input from the regular QWERTY: I've written "chuu", which has temporarily been input into the search box, and on top of the keys I'm given a selection of characters which have the "chuu" reading and which I can tap to input and search them.
Just how it does that, though, is a mystery to me. Prompted by another thread here (thank you, @EastEriq!), I've looked into system/language/keyboard and system/config/global.cfg. The first only contains files for EN, IT, RU and UA (so not even for all the languages that come with the device). The second does include a line that reads language=ZH (which I'm taking to be Chinese), but the following lines are nothing I can make sense of. (I can copy-paste the content of global.cfg if anyone is interested.) The device doesn't even have a Chinese dictionary, but it must keep all those characters somewhere.

Also, I remember someone saying on some forum that they only got their Japanese keyboard to work after switching to the pre-intalled Chinese keyboard, then switching back to the English one. Odd.

I've attached a .zip with 1) the accessory files (with the extended, "better" collates.txt, but the other two are directly from your JaK folder, Markismus), 2) the .xdxf JA-EN dictionary, which, iirc, I converted from the awesome JMdict, and 3) the resulting .dic dictionary for Pocketbook.
Attached Thumbnails
Click image for larger version

Name:	IMG_20250105_114126_346.jpg
Views:	107
Size:	969.0 KB
ID:	212846  
Attached Files
File Type: zip Japanese.zip (33.17 MB, 107 views)
mujina is offline   Reply With Quote
Old 01-05-2025, 05:53 AM   #298
EastEriq
Groupie
EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.
 
Posts: 199
Karma: 195502
Join Date: Jan 2018
Device: Cybook Orizon, PocketBook Touch HD
Quote:
@EastEriq, could you correct the collates.txt
It would be better if someone who really reads Japanese does it. In any event, at cursory look to me the file looks right. The zeal in exotic latin substitutions is probably harmless, I have no idea whether too many unnecessary candidates affect negatively the performance of the dic app, but I'd presume not.
On the design point of view I'd say that they may be relevant to dictionaries something->ja where the language looked up uses them.

As for input system, I don't know. General Japanese input is elaborate, usually you type phonetically and software proposes somehow (i.e. with popups) the possible kanji matching and completions; I wouldn't count on something like that to be implemented in PB. IIRC there were posts earlier in this thread about a kana-only-input Japanese dic.

ETA: https://www.mobileread.com/forums/sh...se#post4130678

Last edited by EastEriq; 01-05-2025 at 02:23 PM.
EastEriq is offline   Reply With Quote
Old 01-05-2025, 02:54 PM   #299
EastEriq
Groupie
EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.EastEriq can program the VCR without an owner's manual.
 
Posts: 199
Karma: 195502
Join Date: Jan 2018
Device: Cybook Orizon, PocketBook Touch HD
Quote:
Thank you, tried it just now. Just to check if I got it right: I replaced both the second ~ - and the - - with ~ - followed by two blank lines.
What if you use something like the one attached? I've corrected the one posted above (hiragana), according to the format of many others, including non latin ones, found in /ebrmain/language/keyboard/. That is, no blank lines, a first nonspaced block with upper and lowercase romaji, -- separators, and two blocks of spaced characters, terminated by -- on the last line.
By the way looking on other keyboards I think you can make virtual keys wider by adding :n after them, like e.g.
Code:
sp:4  ok:2
Attached Files
File Type: txt keyboard.txt (498 Bytes, 69 views)

Last edited by EastEriq; 01-05-2025 at 02:57 PM.
EastEriq is offline   Reply With Quote
Old 01-05-2025, 03:57 PM   #300
mujina
Junior Member
mujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-booksmujina has learned how to read e-books
 
Posts: 5
Karma: 920
Join Date: Dec 2024
Device: PocketBook Verse
@EastEriq, wow, I've been trying to do the exact same thing today, looking at other keyboard.txt files floating around the internet and trying to figure out what's different. My attempts weren't as precise as yours, though.
It still doesn't work, BUT it suddenly hit me - why didn't I actually open one of those other files in my system/language/keyboard, just because they were for different languages? it was silly of me not to do it, because now I see the content looks completely different. Their extension is .extkbd, but they open with a regular text editor and the EN one, for instance, looks like this:

#en_kb tab delimeted
#key normal shift caps shiftcaps

#KEY_ESC ESC
KEY_1 1 ! 1 !
KEY_2 2 @ 2 @
KEY_3 3 # 3 #
KEY_4 4 $ 4 $
KEY_5 5 % 5 %
KEY_6 6 ^ 6 ^
KEY_7 7 & 7 &
KEY_8 8 * 8 *
KEY_9 9 ( 9 (
KEY_0 0 ) 0 )
KEY_MINUS - _ - _
KEY_EQUAL = + = +
#KEY_BACKSPACE BACKSPACE
#KEY_TAB TAB
KEY_Q q Q Q q
KEY_W w W W w
KEY_E e E E e
KEY_R r R R r
KEY_T t T T t
KEY_Y y Y Y y
KEY_U u U U u
KEY_I i I I i
KEY_O o O O o
KEY_P p P P p
KEY_LEFTBRACE [ { [ {
KEY_RIGHTBRACE ] } ] }
#KEY_ENTER ENTER
#KEY_LEFTCTRL LEFTCTRL
KEY_A a A A a
KEY_S s S S s
KEY_D d D D d
KEY_F f F F f
KEY_G g G G g
KEY_H h H H h
KEY_J j J J j
KEY_K k K K k
KEY_L l L L l
KEY_SEMICOLON ; : ; :
KEY_APOSTROPHE ' " ' "
KEY_GRAVE ` ~ ` ~
#KEY_LEFTSHIFT
KEY_BACKSLASH \ | \ |
KEY_Z z Z Z z
KEY_X x X X x
KEY_C c C C c
KEY_V v V V v
KEY_B b B B b
KEY_N n N N n
KEY_M m M M m
KEY_COMMA , < , <
KEY_DOT . > . >
KEY_SLASH / ? / ?
#KEY_RIGHTSHIFT
KEY_KPASTERISK * * * *
#KEY_LEFTALT
KEY_SPACE SPACE
#KEY_CAPSLOCK
KEY_DELETE DELETE
KEY_BACKSPACE BACKSPACE

As soon as I have time, I'll try to figure out what each thingy does and make an equivalent for Japanese. Wondering if I should try replacing the first and second column, or the third and fourth? (I'm working on intuition here, I don't know the first thing about how it's done, but trial and error should do it.)

Of course, even if it works, I expect all it will do is allow one to write hiragana and maybe katakana, without that wonderful magic that brings up kanji options in the suggestion box above the keyboard. Wish I knew how that happens. But it's better than nothing! (Also, I tried adding Chinese as a second keyboard earlier and now my English keyboard also shows suggestions on top. I could swear they weren't there earlier. Funny.)

Hope to return with successful updates. (Not promising anything about a legendary Monday, though - I've read enough of this forum to know better )
mujina is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pocketbook dictionary logan PocketBook 322 03-05-2024 09:48 AM
Dictionary coversion from .mobi to pocketbook format? doctorat PocketBook 16 07-01-2020 05:34 PM
Webster's 1913 Dictionary in Pocketbook Format luqmaninbmore PocketBook 8 05-27-2020 10:41 AM
SW>EN Dictionary for Pocketbook tttrine PocketBook 3 06-09-2015 06:01 AM


All times are GMT -4. The time now is 04:05 PM.


MobileRead.com is a privately owned, operated and funded community.