Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader > Kobo Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 12-08-2012, 03:52 PM   #136
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
I am not sure what exactly is the "idea" behind the 11.html. As, e.g., the case of "o'clock" shows (it works in 11.html and in o'.html) the rules are not so clear-cut. To my mind the rule should be: Put into 11.html all cases that otherwise would result in an invalid filename (on any? OS), period not being accepted as part of a filename proper. I did however not test whether the search engine finds for instance "3rd" in 3r.html.
tshering is offline   Reply With Quote
Old 12-08-2012, 04:07 PM   #137
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by AlPe View Post
... every key NOT starting as [a-z][a-z] (when lowercased) will go to 11.html.
This sends all expressions of some languages to 11.html.
tshering is offline   Reply With Quote
Old 12-08-2012, 04:18 PM   #138
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Quote:
Originally Posted by tshering View Post
This sends all expressions of some languages to 11.html.
Do not take that expression too literally: it was a short-hand for "every word (lowercased) NOT starting with two letters --- in the range a-z. My code actually generates other files other than 11.html
AlPe is offline   Reply With Quote
Old 12-08-2012, 04:23 PM   #139
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
To be more clear, I am thinking of languages that do not use Latin script.
tshering is offline   Reply With Quote
Old 12-08-2012, 04:25 PM   #140
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
I tried this dictionary:

Code:
<?xml version = "1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE document SYSTEM "dictionary.dtd"> 
<dictionary>
<entry>
 <key>Würstel</key>
 <def>Definition of <i>Würstel</i>, with upper case letter.</def>
</entry>
<entry>
 <key>würstel</key>
 <def>Definition of <i>würstel</i>, with lower case letter.</def>
</entry>
<entry>
 <key>1826</key>
 <def>Test of <i>1826</i> one-eight-two-six.</def>
</entry>
<entry>
 <key>è</key>
 <def>"è" is the Italian equivalent of "is".</def>
</entry>
<entry>
 <key>Leopardi</key>
 <def><i>Leopardi</i> is the name of a famous Italian poet.</def>
</entry>
<entry>
 <key>leopardi</key>
 <def><i>leopardi</i> means "leopards" in Italian.</def>
</entry>
<entry>
 <key>égloga</key>
 <def>Type of ancient poem.</def>
</entry>
</dictionary>
My code generates le.html (Leopardi and leopardi) and 11.html (all the other words). All the keys are present in words, and they are recognized by the search function on the kobo (tested this by searching for them in search box).

Apparently, searching for "1826" retrieves its definition, while "égloga" or " würstel" (they went into 11.html) does not --- this could mean that the Kobo expects them to be in ég.html and wü.html files. So far so good, but I am puzzled by the fact that also "è" does not retrieve its definition, albeit being in 11.html.

Last edited by AlPe; 12-08-2012 at 04:27 PM.
AlPe is offline   Reply With Quote
Old 12-08-2012, 04:25 PM   #141
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Quote:
Originally Posted by tshering View Post
To be more clear, I am thinking of languages that do not use Latin script.
Yes, that's sure. How should this be dealt with?
AlPe is offline   Reply With Quote
Old 12-08-2012, 04:34 PM   #142
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by AlPe View Post
So far so good, but I am puzzled by the fact that also "è" does not retrieve its definition, albeit being in 11.html.
The Kobo expects it to be in éa.html
tshering is offline   Reply With Quote
Old 12-08-2012, 04:37 PM   #143
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by AlPe View Post
Yes, that's sure. How should this be dealt with?
I think any character beyond 127 is acceptable. For instance all Japanese characters work. We only have to find out which characters from 0-127 are not acceptable.
tshering is offline   Reply With Quote
Old 12-08-2012, 04:48 PM   #144
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Quote:
Originally Posted by tshering View Post
The Kobo expects it to be in éa.html
You are right, I tried and I confirm it.
AlPe is offline   Reply With Quote
Old 12-08-2012, 04:51 PM   #145
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Quote:
Originally Posted by tshering View Post
I think any character beyond 127 is acceptable. For instance all Japanese characters work. We only have to find out which characters from 0-127 are not acceptable.
Seems legit. I will try to check tomorrow. Thanks for the lead.
AlPe is offline   Reply With Quote
Old 12-09-2012, 06:42 AM   #146
gouni
Connoisseur
gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.
 
gouni's Avatar
 
Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
@Alpe, you want that I you send my couple of letter French dictionary for your test?
gouni is offline   Reply With Quote
Old 12-09-2012, 08:03 AM   #147
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Yes, please, especially if you have already experimented on which keywords should go to which XX.html file.
AlPe is offline   Reply With Quote
Old 12-09-2012, 09:37 AM   #148
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 595
Karma: 1281565
Join Date: Dec 2011
Location: Padova, Italy
Device: Kindle3, Odyssey, eDGe, A60, PRS-T1, iPad3, KoboGlo
Ok, I have performed a test of ASCII (0-127) characters, except control characters, and it seems that you can move all keywords containing a non-letter (defined as [A-Za-z]) in the first two characters into 11.html.

I attach the XML dictionary I used for testing, plus the "Italian" Kobo dictionary compiled by Penelope from the XML dictionary. (Note: download the XML file and open it with a text editor, since it might not display properly in a browser, since it has raw "< > &". Penelope does not use a DOM parsers, hence it allows those characters to be unescaped.)

I updated the Google Code source code of Penelope, to reflect tshering's suggestion (thanks!) about 1-character keywords.
Attached Files
File Type: xml special.xml (2.0 KB, 39 views)
File Type: zip dicthtml-it.zip (1.1 KB, 24 views)

Last edited by AlPe; 12-09-2012 at 09:40 AM. Reason: Added note about the XML file
AlPe is offline   Reply With Quote
Old 12-09-2012, 10:12 AM   #149
gouni
Connoisseur
gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.gouni ought to be getting tired of karma fortunes by now.
 
gouni's Avatar
 
Posts: 86
Karma: 546021
Join Date: Nov 2012
Device: kobo
@Alpe I you send a mail.
gouni is offline   Reply With Quote
Old 12-09-2012, 10:34 AM   #150
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 1,301
Karma: 337944
Join Date: Jun 2012
Device: kobo touch
Quote:
Originally Posted by AlPe View Post
Ok, I have performed a test of ASCII (0-127) characters, except control characters, and it seems that you can move all keywords containing a non-letter (defined as [A-Za-z]) in the first two characters into 11.html.
In order to work also with languages that use other scripts than Latin script, e.g. Greek, it would be preferable that your script sends to 11.html only those keywords that must be send to 11.html, rather that all those keywords that can be send to 11.htrml. This should be only a small change in your script but have a great effect on the usability.
tshering is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What's file format of dictionary mnjkl Kobo Reader 2 12-12-2011 08:48 AM
Dictionary format jgray Sony Reader 1 10-25-2010 09:52 AM
English Thesaurus in the dictionary format osnova Amazon Kindle 14 12-12-2009 06:42 PM
Dictionary: what version? can it be in firmware? jedix Sony Reader Dev Corner 7 12-05-2008 12:00 PM
Webster dictionary in DEPReader format abigail Reading and Management 0 08-10-2005 08:00 AM


All times are GMT -4. The time now is 02:36 PM.


MobileRead.com is a privately owned, operated and funded community.