Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 02-01-2020, 02:23 PM   #31
Alanon
Connoisseur
Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.Alanon once ate a cherry pie in a record 7 seconds.
 
Alanon's Avatar
 
Posts: 57
Karma: 1644
Join Date: Jul 2017
Location: Serbia
Device: Kobo Aura One
Quote:
Originally Posted by NiLuJe View Post
@Alanon: I am mildly intrigued . Which source did you use for the SOED, because, AFAIK, none of the digital offerings available are in a sane format (i.e., I'm only aware of the awful Windows CD-ROM versions).
Hmm, to be honest, I can't remember exactly. I converted about a dozen or so dictionaries more than two years ago. In the meantime I've forgotten almost everything about the process. In fact, I'm considering doing a fresh conversion of a better version of the OED, but it once again looks like a daunting task. Lots of things to re-learn.

I use a free bit of kit called GoldenDict for my dictionaries, which accepts nearly every format known to man. The versatility means that there's a terrific community of modders/rippers/compilers, and I've amassed quite a hoard of dictionaries. If I had to guess, I'd say the source was a .dsl (a semi-proprietary format of the ABBYY Lingvo dictionary software) of the 6th edition of the SOED. I've always presumed that this version came from a rip of the CD-ROM data. I'm still using that dsl in GoldenDict on my desktop and Android. Comparing the results of identical searches in Kobo and GoldenDict gives identical results, so it would make sense.

I do remember having to double convert some dictionaries, once with some ancient tool that converted what I had to a format that Penelope (was it even called that back then?) of that time could read, and once with Penelope itself to get it to work with Kobo. Perhaps that's what I did with the SOED? Regardless, it converted remarkably well. The file even retained style rules, layout and line breaks, which not all of my conversions did.

Drop me a line if you'd like to tinker with it.
Alanon is offline   Reply With Quote
Old 02-01-2020, 02:23 PM   #32
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 5,270
Karma: 12221060
Join Date: Sep 2009
Location: UK
Device: PRS-350, Kobo: Aura6", H2O, GloHD, KA1, ClaraHD, Forma
@geek1011,

I copied the whole directory to my Win10 C:\Program Files directory.

The following CMD
Code:
"C:\Program Files\marisa64\usr\local\bin\marisa-build.exe" -o words2 index_coe.txt
resulted in this error
Code:
"The code execution cannot proceed because libgcc_s_sjlj-1.dll was not found. Reinstalling the program may fix this problem."
followed by several consecutive similar errors. More details can be supplied if it would help.

In case this is relevant, I believe the much older Marisa Windows executables posted here on MR were 32-bit. They were all much smaller than these new ones and they were all standalone .exe's which could be copied to wherever was most convenient at the time. I say "all", but I think I've only actually used marisa-build.exe.

ETA: This file was also flagged as missing, libstdc++-6.dll

Last edited by jackie_w; 02-01-2020 at 02:54 PM. Reason: ETA
jackie_w is offline   Reply With Quote
Old 02-01-2020, 02:41 PM   #33
Semwize
 
Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.Semwize 's ceiling is 100% spider-free.
 
Posts: 342
Karma: 118814
Join Date: Jun 2016
Device: Kobo
Quote:
Originally Posted by jackie_w View Post
libgcc_s_sjlj-1.dll was not found
I put this dll in the folder. And received: error 0xc000007b
Semwize is offline   Reply With Quote
Old 02-01-2020, 02:49 PM   #34
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 1,579
Karma: 4131350
Join Date: May 2016
Location: Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Marisa tools for Windows

Try this one. I've built it statically (./configure --enable-shared=no --enable-static=yes --host=i686-w64-mingw32 LDFLAGS="-static -static-libgcc -static-libstdc++").

The other one worked for me when testing in wine, and I didn't test it on Windows. I've tested this one in an actual Windows VM, so it should work fine (the binaries are standalone).

P.S. The reason these are so much larger is I'm cross compiling c++ with mingw rather than msvc.
Attached Files
File Type: zip marisa-trie_i686-mingw-w64-static_970b20c.zip (7.82 MB, 19 views)

Last edited by geek1011; 02-03-2020 at 12:03 AM. Reason: added header
geek1011 is online now   Reply With Quote
Old 02-01-2020, 03:21 PM   #35
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 5,270
Karma: 12221060
Join Date: Sep 2009
Location: UK
Device: PRS-350, Kobo: Aura6", H2O, GloHD, KA1, ClaraHD, Forma
Quote:
Originally Posted by geek1011 View Post
Try this one. I've built it statically (./configure --enable-shared=no --enable-static=yes --host=i686-w64-mingw32 LDFLAGS="-static -static-libgcc -static-libstdc++").

The other one worked for me when testing in wine, and I didn't test it on Windows. I've tested this one in an actual Windows VM, so it should work fine (the binaries are standalone).

P.S. The reason these are so much larger is I'm cross compiling c++ with mingw rather than msvc.
I don't think I understood one in three words of that, but the good news is that the new marisa-build.exe now runs Thanks for taking the time.

Now to experiment with marisa-dump.exe ...
jackie_w is offline   Reply With Quote
Old 02-01-2020, 04:44 PM   #36
rtiangha
Addict
rtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbit
 
Posts: 239
Karma: 60749
Join Date: Jul 2016
Device: Kobo Glo HD
Quote:
Originally Posted by NiLuJe View Post
@Alanon: I am mildly intrigued . Which source did you use for the SOED, because, AFAIK, none of the digital offerings available are in a sane format (i.e., I'm only aware of the awful Windows CD-ROM versions).
Just generally commenting on how pyglossary is a wonderful tool, especially when teamed up with penelope for Kobos. And it seems like all the cool kids are using mdict these days (cough)...
rtiangha is offline   Reply With Quote
Old 02-02-2020, 05:25 PM   #37
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 1,579
Karma: 4131350
Join Date: May 2016
Location: Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Kobo prefix logic

Here's all the logic used to generate the prefixes. I've tested it against libnickel, and it's also based on the disassembly of DictionaryParser::htmlForWord (it was slightly annoying that most of the Qt stuff was inlined). I've simplified it and improved the performance, and also left in the original code for reference.

I'll write some proper documentation and make a thread when I finish dictutil later.

Here is the code: https://sourcegraph.com/github.com/g...util.go#L30-86

v1/v2 dictionary stuff

And here are some useful notes about v1/v2 dictionaries (this hasn't ever been discussed, or even noticed before AFAIK): https://pgaskin.net/dictutil/dicthtml/v1v2

Last edited by geek1011; 02-03-2020 at 12:03 AM. Reason: added header
geek1011 is online now   Reply With Quote
Old 02-02-2020, 07:16 PM   #38
rtiangha
Addict
rtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbitrtiangha with a running start, can leap into geosynchronous orbit
 
Posts: 239
Karma: 60749
Join Date: Jul 2016
Device: Kobo Glo HD
Quote:
Originally Posted by geek1011 View Post
Here's all the logic used to generate the prefixes. I've tested it against libnickel, and it's also based on the disassembly of DictionaryParser::htmlForWord (it was slightly annoying that most of the Qt stuff was inlined). I've simplified it and improved the performance, and also left in the original code for reference.

I'll write some proper documentation and make a thread when I finish dictutil later.

Here is the code: https://sourcegraph.com/github.com/g...util.go#L30-86

And here are some useful notes about v1/v2 dictionaries (this hasn't ever been discussed, or even noticed before AFAIK): https://pgaskin.net/dictutil/dicthtml/v1v2
Wow, good work. I knew of the difference in dictionary versions, but wasn't sure what exactly made them different.

Looking forward to seeing your documentation on the dictionary format and the definition of the various tags. Am wondering if there are any obscure tags that I have yet to encounter.

I'm curious about Kanji: Will dictutil be able to handle those properly? Just wondering since the code says it's a special case.

One of my main interests is in making bilingual Japanese word lists of, say my Anki flash card deck or one of my Japanese textbooks (in say csv or TAB file format) to help simplify definitions to my reading level and to keep the definitions consistent to what I'm learning and may be tested on later, but Japanese in particular has given me the most problems with dictionaries made with Penelope sometimes working and sometimes not. I suppose I could write and tag and sort into various files my own version manually, but I'd like to avoid that, if possible.

And I believe that tshering discovered that kanji look up only really works properly when using one of the built in Japanese language dictionaries (either jaaxdis, en-ja or en-ja-pgs), especially if you're not using the Japanese locale (in order to bring up the Japanese keyboard, I guess; not sure how it works with Chinese now that it is a supported language but word highlight/look up still seems fine regardless of OS language) because it may use a different function compared to the other languages. Is that still the case, and if so, is it possible (maybe through a patch?) to make Kanji lookup work regardless of the dictionary selected (for example, in order to have more than 3 different Japanese-related dictionaries installed)? At the very least, I want to create a ja-en dictionary, and while I'm using norbusan's utility to enhance the built in jaxxdis dictionary, I really would love to create an updated one based on JMDict or this random Kenkyuusha one that somehow made its way into my possession (cough).

Last edited by rtiangha; 02-02-2020 at 07:38 PM.
rtiangha is offline   Reply With Quote
Old 02-02-2020, 11:58 PM   #39
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 1,579
Karma: 4131350
Join Date: May 2016
Location: Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by rtiangha View Post
I'm curious about Kanji: Will dictutil be able to handle those properly? Just wondering since the code says it's a special case.
No, sorry, that won't be in the first version. The thing is, it's a whole separate code path which I'd need to figure out, and also, I don't think anyone's looked into how to treat a custom dictionary as Japanese (the Kanji stuff is based on the locale).

Cross-compiling libmarisa

I've finally got marisa to cross-compile properly on all platforms for dictutil: https://github.com/geek1011/dictutil...8d1cb469d4da8a. I can't believe it actually worked though, as I did it by writing a tool to merge all the C* sources into a single file and resolving includes to allow it to be easily compiled by CGO.

Testing Kobo prefix generation

I've also written dictword-test (https://pgaskin.net/kobo-mods/dictword-test/), which allows you to test libnickel's prefix generation directly. This tool is completely self-contained and doesn't conflict with or require patching libnickel. Binaries are available here: https://ci.appveyor.com/project/geek...uild/artifacts.

Last edited by geek1011; 02-03-2020 at 12:02 AM. Reason: dictword-test
geek1011 is online now   Reply With Quote
Old 02-03-2020, 09:35 PM   #40
skybook
Member
skybook began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Jan 2020
Device: Kobo Libra H2O
Quote:
Originally Posted by geek1011 View Post
Marisa tools for Windows

Try this one. I've built it statically (./configure --enable-shared=no --enable-static=yes --host=i686-w64-mingw32 LDFLAGS="-static -static-libgcc -static-libstdc++").

The other one worked for me when testing in wine, and I didn't test it on Windows. I've tested this one in an actual Windows VM, so it should work fine (the binaries are standalone).

P.S. The reason these are so much larger is I'm cross compiling c++ with mingw rather than msvc.
using this file, I finally got my custom dictionary to work!! (Granted, I can't look up words with apostrophes or spaces, but the dicitonary installed and the other word lookups worked!

Thank you geek1011 and rtiangha! (And everyone else who helped me get to this point)

(BTW rtiangha, I love the dictionary+thesaurus, especially the formatting! So far the word look-ups are accurate, with the one exception of the word 'invalided', but the WordNet 2 file I found here has the definition, so I'm happy )
skybook is offline   Reply With Quote
Old 02-07-2020, 04:45 AM   #41
droopy
Addict
droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.droopy ought to be getting tired of karma fortunes by now.
 
Posts: 337
Karma: 800008
Join Date: Apr 2009
Device: Kobo Forma. Laptop is Linux Mint
Quote:
Originally Posted by rtiangha View Post
Thanks, rtiangha. I am a point-and-click computer user and don't have programmming/coding skillz . So I very much appreciate your sharing the dictionary zip file for easy installation.
droopy is offline   Reply With Quote
Reply

Tags
dictionary, kobo

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Chinese-English dictionary tshering Kobo Developer's Corner 60 01-13-2020 05:18 AM
Custom Japanese-English dictionary tshering Kobo Developer's Corner 55 10-13-2018 10:43 AM
Dictionary plugin in Sigil? For example Oxford-English Dictionary. Rindr Plugins 2 03-04-2018 12:11 PM
English-English Dictionary for 301 LevAizik PocketBook 6 12-03-2013 10:42 PM
PB302 - How to replace English->Russian dictionary with English only (with defin.)? guyanonymous PocketBook 29 08-03-2010 07:05 PM


All times are GMT -4. The time now is 05:37 PM.


MobileRead.com is a privately owned, operated and funded community.