Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader > Kobo Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 12-01-2012, 05:43 AM   #121
ShellShock
Wizard
ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.
 
ShellShock's Avatar
 
Posts: 1,176
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
It is certainly technically possible to convert a commercial dictionary to Kobo format, assuming you already own the commercial version and it is for personal use only; even then it may not be legal, but I am not a lawyer so you need to make your own judgement about that. Morally you could argue that you only bought the commercial version so you could format shift it to Kobo format, so it is a win for the publisher - they have sold a copy which they would not have done otherwise. Of course, the resulting Kobo dictionary must not be distributed to anyone else, which would definitely be morally and legally wrong.
ShellShock is offline   Reply With Quote
Old 12-01-2012, 05:54 AM   #122
Papi
Addict
Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.
 
Posts: 311
Karma: 547600
Join Date: Jul 2010
Location: Paris
Device: Kindle Keyboard, Kindle NT, PRS-650
Quote:
Originally Posted by ShellShock View Post
It is certainly technically possible to convert a commercial dictionary to Kobo format, assuming you already own the commercial version and it is for personal use only; even then it may not be legal, but I am not a lawyer so you need to make your own judgement about that. Morally you could argue that you only bought the commercial version so you could format shift it to Kobo format, so it is a win for the publisher - they have sold a copy which they would not have done otherwise. Of course, the resulting Kobo dictionary must not be distributed to anyone else, which would definitely be morally and legally wrong.
I hope there's no problem discussing it, of course I wouldn't distribute the dictionaries (if I ever achieve it), maybe I would give the program to convert dictionaries if that's no problem with the forum rules. Personally I have no problem with trying to convert it since I bought those dictionaries, and I would even avoid the hassle if there was some for sell on the Kobo, unfortunately dictionaries are some of the most evident yet worst implementation in the use of an e-reader generally speaking, not just on the Kobo.
Papi is offline   Reply With Quote
Old 12-01-2012, 06:30 AM   #123
ShellShock
Wizard
ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.
 
ShellShock's Avatar
 
Posts: 1,176
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
Have you seen the instructions attached to the first post in this thread? That gives the steps you need to create a Kobo dictionary, all of which can be automated. There are three main steps:

1. Converting the source dictionary into the html format used by the Kobo.
2. Creating an index from the html.
3. Packaging the html and index into the Kobo dictionary.

Steps 2 & 3 are generic and are good candidates for a generic program; in fact I have code that does this already. I'm thinking about tidying it up and making it generally available; I don't know if there is much demand for it. It is Windows only.

Step 1 is the most time consuming and is bespoke to each source dictionary, as they all have their own formats. Again for this step I write a program, but it is specific to each source dictionary. Depending on the source, it can be done manually using a text editor and e.g., regular expressions, or perhaps xslt tranformations if you have the skill.

I think AlPe is also working on a converter program; see here.
ShellShock is offline   Reply With Quote
Old 12-01-2012, 09:16 AM   #124
Papi
Addict
Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.
 
Posts: 311
Karma: 547600
Join Date: Jul 2010
Location: Paris
Device: Kindle Keyboard, Kindle NT, PRS-650
Yep thanks I've read your thread plus this one, thanks to every one's work here it's pretty clear what has to be done, now what I don't know is what the mobipocket dictionary format looks like, but I think it doesn't belong in this thread, I'll do some research now...
Papi is offline   Reply With Quote
Old 12-03-2012, 06:28 AM   #125
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by Papi View Post
Also in French we have something very annoying for dictionaries, and it's indeed not handled by the French Larousse dictionary found in the Kobo : s', l', m', t' that can precedes a verb or a noun. For example, abris -》l'abris. If I put l'abris as a variant of abris, as far as I understood, it won't work (as it didn't with go/went).
I cannot see that this should be a major problem. Usually the main entries of dictionaries do not contain, for instance, articles. You would not expect to find "the sun" in an English dictionary, or "le soleil" in a French dictionary. So why would you expect to find "l'abri(s)"? Putting all possible combinations into the dictionary would increase it considerably, as you said:
Quote:
Originally Posted by Papi View Post
Maybe a file l'.html would work (as shown in the o'clock example), but it would contain a lot if words, basically all the nouns and verbs starting with a vowel.
Therefore, if there is really a need to take care of "l'abri", "s'attarder" and so on this would be better done by the software by removing l', s', qu' and so on while searching. And the KT does this already in a simple way. If you point your finger on "l'heure" it will auto-select "heure" and the correct definition will pop up. I found only a few expressions in the Larousse that would get "lost" in this way, namely "qu'en-dira-t-on", "c'est-à-dire" and "m'as-tu-vu". There might be some more cases, but I think this mechanism works in general rather well. To my mind, adding all l'-expressions and so on would be more pain than gain.
Quote:
Originally Posted by Papi View Post
... what I don't know is what the mobipocket dictionary format looks like, but I think it doesn't belong in this thread, I'll do some research
If you did not find anything that worked for you you can try converting it with calibre to epub. This will give you one or several (x)html files. However, I did not try it myself.
tshering is offline   Reply With Quote
Old 12-03-2012, 07:36 AM   #126
Papi
Addict
Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.
 
Posts: 311
Karma: 547600
Join Date: Jul 2010
Location: Paris
Device: Kindle Keyboard, Kindle NT, PRS-650
Thanks for your answer tshering.

Quote:
Originally Posted by tshering View Post
Therefore, if there is really a need to take care of "l'abri", "s'attarder" and so on this would be better done by the software by removing l', s', qu' and so on while searching. And the KT does this already in a simple way. If you point your finger on "l'heure" it will auto-select "heure" and the correct definition will pop up. I found only a few expressions in the Larousse that would get "lost" in this way, namely "qu'en-dira-t-on", "c'est-à-dire" and "m'as-tu-vu". There might be some more cases, but I think this mechanism works in general rather well. To my mind, adding all l'-expressions and so on would be more pain than gain.
I must have missed something, because I think it should indeed be handled by the software : consider ' to be a word delimiter. But on my glo if I put my finger on l'abris and wait for the dictionary to appear, it will say it cannot find an entrance for "l'abris" so it does consider it to be one word. Maybe I have to resize the selection, maybe I can get rid of the l' by moving the left delimiter.

Quote:
Originally Posted by tshering
If you did not find anything that worked for you you can try converting it with calibre to epub. This will give you one or several (x)html files. However, I did not try it myself.
Thanks I didn't try that, I tried moby_unpack which seems to work fine, except it does output an html with a big line of approximately 60 Mo (that's for my smallest dictionary). Now I have to decide for the best language to handle that. I think C would be best but it's been a long time since I didn't program in C, maybe some perl or Python but I would have to reformat the file first I think.
Papi is offline   Reply With Quote
Old 12-03-2012, 08:18 AM   #127
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by Papi View Post
But on my glo if I put my finger on l'abris and wait for the dictionary to appear, it will say it cannot find an entrance for "l'abris" so it does consider it to be one word.
This is strange. On my Touch, I have really to try hard if I want l' to be selected together with the following word.
tshering is offline   Reply With Quote
Old 12-03-2012, 08:44 AM   #128
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Quote:
Originally Posted by ShellShock View Post
Have you seen the instructions attached to the first post in this thread? That gives the steps you need to create a Kobo dictionary, all of which can be automated. There are three main steps:

1. Converting the source dictionary into the html format used by the Kobo.
2. Creating an index from the html.
3. Packaging the html and index into the Kobo dictionary.

Steps 2 & 3 are generic and are good candidates for a generic program; in fact I have code that does this already. I'm thinking about tidying it up and making it generally available; I don't know if there is much demand for it. It is Windows only.

Step 1 is the most time consuming and is bespoke to each source dictionary, as they all have their own formats. Again for this step I write a program, but it is specific to each source dictionary. Depending on the source, it can be done manually using a text editor and e.g., regular expressions, or perhaps xslt tranformations if you have the skill.

I think AlPe is also working on a converter program; see here.
Yes, that's correct. I want to add "output to Kobo format" to Penelope, my script that converts XML-like or Stardict dictionaries to the format required by the Bookeen Cybook Odyssey.

I really wanted to spend some time on it last weekend, but my spare time was consumed discussing interesting stuff about (our) Audio-eBooks with DAISY.

I will try to write the output function by the end of this week.

Last edited by AlPe; 12-03-2012 at 08:47 AM.
AlPe is offline   Reply With Quote
Old 12-03-2012, 02:43 PM   #129
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by Papi View Post
But on my glo if I put my finger on l'abris and wait for the dictionary to appear, it will say it cannot find an entrance for "l'abris" so it does consider it to be one word.
Did you try this with several books? Maybe your book uses a sign for an apostrophe that the reader does not recognize as delimiter.
tshering is offline   Reply With Quote
Old 12-05-2012, 02:48 AM   #130
Papi
Addict
Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.Papi ought to be getting tired of karma fortunes by now.
 
Posts: 311
Karma: 547600
Join Date: Jul 2010
Location: Paris
Device: Kindle Keyboard, Kindle NT, PRS-650
I tried yesterday, out of 4 books, one reacted as expected (' is a delimiter), the 3 others had an apostrophe not considered as a delimiter. Weird, I guess it's not the same exact character, I should check what's inside the file. But I also realized you can adjust the selection to whatever character you want, so it's fine, just a bit more annoying, so I won't try to deal with that at the dictionary level.
Papi is offline   Reply With Quote
Old 12-05-2012, 01:02 PM   #131
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
In many cases, eBook typographers uses a typographical apostrophe (U+2019) which renders better than the typewriter apostrophe (U+0027).

See: http://en.wikipedia.org/wiki/Apostrophe#Unicode
AlPe is offline   Reply With Quote
Old 12-08-2012, 01:49 PM   #132
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Hi all,

I implemented the output to Kobo format into Penelope, my dictionary-converter tool.

Source code: http://code.google.com/p/penelope-dictionary-converter/
Doc/explanation: http://www.albertopettarin.it/penelope.html

Note that you will probably need Python 2.6+ (not Python 3.x) to run it.

Example:
Code:
$ python penelope.py --output-kobo -p bar -f en -t it
Create English-to-Italian dictionary in Kobo format (dicthtml-en-it.zip), from StarDict files bar.*

Example 2:
Code:
$ python penelope.py --xml --output-kobo -p bar -f en -t en
Create English dictionary in Kobo format (dicthtml.zip), from XML file bar.xml

===

NOTE 1: probably the management of 11.html entries is not completely correct right now. At the moment every key NOT starting as [a-z][a-z] (when lowercased) will go to 11.html. Better ideas?

NOTE 2: you will need to modify MARISA_BUILD_PATH in penelope.py, pointing it to the directory containing a working build of MARISA.

Last edited by AlPe; 12-08-2012 at 02:03 PM.
AlPe is offline   Reply With Quote
Old 12-08-2012, 02:01 PM   #133
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
=== DELETED, wrong post ===
AlPe is offline   Reply With Quote
Old 12-08-2012, 03:11 PM   #134
ShellShock
Wizard
ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.ShellShock ought to be getting tired of karma fortunes by now.
 
ShellShock's Avatar
 
Posts: 1,176
Karma: 2431850
Join Date: Sep 2008
Device: IPad Mini 2 Retina
I suspect that 11.html is a "catch-all" file for entires that are not found elsewhere. I have some success with files using characters other than a-z. So I suggest you try it and see.
ShellShock is offline   Reply With Quote
Old 12-08-2012, 03:41 PM   #135
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
If 11.html is a "catch-all", then my current code is good.

Indeed, right now my code does the following: if the lower-cased version of a keyword starts with [a-z][a-z], then it appends that keyword (and its definition) to the corresponding file; otherwise, it appends it to 11.html.

Example:
Code:
argon -> ar.html
yoga -> yo.html
a- -> 11.html
-meter -> 11.html
o'clock -> 11.html
My previous doubt was raised by the fact that, in the official Italian dictionary, I found àa.html or wü.html files, suggesting that also accented characters are "allowed" to be there.

For confirming these issues, I will experiment a bit, but now it is quite late...
AlPe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What's file format of dictionary mnjkl Kobo Reader 2 12-12-2011 08:48 AM
Dictionary format jgray Sony Reader 1 10-25-2010 09:52 AM
English Thesaurus in the dictionary format osnova Amazon Kindle 14 12-12-2009 06:42 PM
Dictionary: what version? can it be in firmware? jedix Sony Reader Dev Corner 7 12-05-2008 12:00 PM
Webster dictionary in DEPReader format abigail Reading and Management 0 08-10-2005 08:00 AM


All times are GMT -4. The time now is 01:29 AM.


MobileRead.com is a privately owned, operated and funded community.