03-04-2012, 08:08 PM | #31 |
Wizard
Posts: 4,742
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
I wasn't trying to rush you at all just give you more options.
If you are using Wheezy you could even explore into packaging it up into a deb-package or have somebody else do it. Maybe they include it. Typing "sudo apt-get install Penelope" would sure beat any other method of installing it, especially since dependencies can be taken care of by the Debian system automatically. But that depends what you are planning to do with it. Since Penelope is released under GPL already it makes only sense, imo. |
03-05-2012, 03:41 AM | #32 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
I see your point, I'll think about it; however, since I'm a perfectionist, I want it to have a man page, etc. ... i.e., some more work to be done.
BTW, if you are running Debian, you just need to install python (with python-pysqlite2) to run Penelope. |
Advert | |
|
03-07-2012, 02:46 AM | #33 |
Wizard
Posts: 4,742
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
Got a few things about Penelope. I finally downloaded it and looked through your code. It appears to me that you hacked the XML support in afterwards, especially since you parse the file by hand reinventing the wheel. Is that what happened?
For stardict the custom parser might be useful, but for XML it is not. The parser is invoked after you already gobblesmacked the XML file apart. Any useful information there might have been other than key and def are gone. If you are capable to write your own custom parser, then you should be able to output a XML file with all information necessary for Penelope. E.g. synonyms as an optional part of an <entry>. I'll rewrite your read_from_xml_format() - maybe you will like it. Also I do not yet quite understand what the difference between substititon and synonym is. Wouldn't it make sense to simply add synonyms to your global substitution list and let them be added at the end? |
03-07-2012, 03:29 AM | #34 | ||
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Quote:
yes, the XML parser was added later. But its philosophy is clear: start from a set of (word, definition) and read it. This is generally the case when you unpack a MOBI dictionary, for example. In that case, the custom parser part is still useful. For example, I might decide to extract synonyms from the definitions (for example, if my definitions have a <p>SYN: ... </p>, but only for some words). But I agree that, in general, one can code a more complex "XML" parser. I just wanted it to be quick and dirty, with the bare minimum needed by the remaining code, as I explain in the web page. But if you want to send me your code, I will look at it and integrate it in the tool. Quote:
Code:
[ word, include, synonyms, substitutions, definition ] The nect effect of synonyms is that an entry is created for word, both in the index and in the definition files, while for each synonym only the index entry is created, pointing at the same definition of "word". The nect effect of substitutions is that an entry is created for "word", pointing at the definition of "substitute_with_some_other_word". The functional difference is that substitutions can be done only after all the definitions are actually written on disk, that's why they are accumulated and processed together at the end. On the other hand, synonyms can be inserted in the index immediately. Three motivating examples for this strategy. 1) When you parse stuff like a wiktionary, where you have lots of pages in the form "mice is the plural of mouse". (Two pages: "Mice" and "Mouse") If you don't want to create a definition of "mice", but still have the definition of "mouse" displayed, when processing "mice" you can set up a substitution: the dictionary will not contain an entry for "mice" but when you select "mice" on your document you will get the definition for "mouse". But since you will encounter "mice" before "mouse", you do not know at which position in the definition file you must make your "mice" index record point at. So, you will use a substitution in this case. (Note that my code does not check that a definition for "Mouse" actually exists) 2) Another example occurs quite often in Italian, where adjectives have suffixes for masculine/feminine and singular/plural (amico, amica, amici, amiche are the four adjectives corresponding to friend). Usually in the dictionary you will find only the masculine singular (amico). But you might want all the four versions to point at the same definition: (amico, amica, amici, amiche -> amico), without having defs for "amica", "amici", "amiche". 3) Sometimes a word has more than one spelling. Again, this is particularily true in Italian, where ancient spellings co-exhist with modern ones (say, "abbazia" and "abbadia" for "abbey"). Usually you will find listed in the dictionary only the modern term, but in its definition you will find something like "ANCIENT SPELLING: ...". In this case, you parse the definition of "abbazia", find out that there is also the ancient spelling "abbadia", and add it to the "abbazia" tuple as a synonym. Doing so will create two entries in the index (one for "abbazia", one for "abbadia") pointing at the same definition. Last edited by AlPe; 03-07-2012 at 03:33 AM. |
||
03-07-2012, 03:51 AM | #35 |
Wizard
Posts: 4,742
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
Ahh thank you, that makes sense especially with Italien beeing so weird. By the way, you do check if Mouse exists when you deal with substitutions:
Code:
if sub_to in global_dictionary: sql_tuple = global_dictionary[sub_to] sql_tuple = ( sql_tuple[0], sub_from, s[sql_tuple[2], sql_tuple[3], sql_tuple[4] ) |
Advert | |
|
03-07-2012, 03:53 AM | #36 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Ok, great, I should have fixed that thing at a certain point, but lost memory of it!
Still, if you define a substitution, but the sub_to does not exist, you loose the sub_from stuff, I guess. |
03-07-2012, 09:42 AM | #37 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Hi,
I have just opened the following project at Google Code: http://code.google.com/p/penelope-dictionary-converter/ If you want to contribute some code, please let me know via email or PM, I will add you at the project. |
03-07-2012, 02:00 PM | #38 | |
Wizard
Posts: 4,742
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
Quote:
|
|
03-13-2012, 10:34 AM | #39 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
I have just pushed a new version of Penelope that allows you to output in StarDict format.
This is useful if you want to convert an XML dictionary into StarDict, or you want to write your own parser to manipulate an existing StarDict dictionary. You can find the code at the Google Code project: http://code.google.com/p/penelope-dictionary-converter/ and the "manual" here: http://www.albertopettarin.it/penelope.html (eventually, it will be moved to the wiki of the Google Code project). Last edited by AlPe; 12-11-2012 at 03:02 PM. Reason: Updated the link to my homepage |
03-21-2012, 10:41 PM | #40 | |
Wizard
Posts: 4,742
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
Quote:
At least once it is all moved it has the big advantage of beeing a Mercurial repository with full featured revision control and ability to commit independent from you. Would appreciate some feedback before I got everything moved only to find out that I need to redo it all for some strange reformating |
|
05-21-2012, 05:26 AM | #41 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Since Bookeen has not implemented a "search in dictionary" function yet, I attach to this post a sample EPUB file that might be used to "emulate" the search function.
It contains the list of words, grouped by starting letter and 4-letters prefix, so that one can "navigate" through the EPUB, find the desired word, select it and have the dictionary definition displayed. I attach the EPUB generated for the ENGLISH dictionary that I helped developing for SBF, based on the GNU Collaborative International Dictionary of English (despite being named "en.wikipedia.dict"). You can get it here: DICT.IDX: http://bit.ly/wfCrcK DICT: http://bit.ly/Ao8JjI Usage (looking for "yerd"): 1) Open dictionary.v1.epub on the Odyssey, as any other EPUB file. 2) Select "Letter Y". 3) Select "YEAS - YIN" (as "yerd" falls into this interval). 4) Locate "yerd" and select it. 5) The dictionary should pop the definition of "yerd" up. This test EPUB was generated by a Python script that will be merged into Penelope project, time permitting. Let me know what you think about it or should you have any suggestions/comments. |
05-25-2012, 05:32 PM | #42 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Any comments or suggestions from those six who downloaded the EPUB file?
I am planning to release the "final version" of the EPUB "dictionary" next week, I would be very grateful if you can post here your thoughts, thanks! |
08-10-2012, 10:27 AM | #43 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2012
Device: Sony Reader PRS-650, CyBook Odyssey, PocketBook Touch 622
|
EL-EN requested
Quote:
Thanks in advance! |
|
08-20-2012, 06:13 AM | #44 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
|
08-22-2012, 09:34 AM | #45 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2012
Device: Sony Reader PRS-650, CyBook Odyssey, PocketBook Touch 622
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
dictionaries | shleepy | Bookeen | 29 | 12-14-2013 11:15 AM |
Dictionaries under 2.1.0. | jshzh | PocketBook | 11 | 01-13-2012 04:53 AM |
Just got K3 and need some help with 3G and dictionaries... | pollo | Amazon Kindle | 1 | 12-29-2011 05:13 PM |
Android Dictionaries | obsessed2 | enTourage Archive | 0 | 05-01-2011 11:44 AM |
Can anybody tell me about dictionaries? | andym | Workshop | 0 | 09-26-2007 03:32 AM |