Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader > Kobo Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 07-09-2017, 09:11 PM   #1
MGlitch
Wizard
MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.
 
Posts: 1,657
Karma: 7750996
Join Date: Aug 2014
Device: Kobo Forma
help with penelope OS X

I'm trying to convert a stardict to kobo dict, for what should be obvious reasons. I've got penelope installed, I have marisa installed. I have penelope 'working' in that I can type penelope and then a various command and it does something, i.e. penelope -h does generate a list of commands I can use.

However when I try and convert a stardict to kobo i get

Code:
penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o mk-it
[INFO] Reading input file(s)...
Traceback (most recent call last):
  File "/usr/local/bin/penelope", line 27, in <module>
    main()
  File "/usr/local/bin/penelope", line 23, in main
    package_main()
  File "/Library/Python/2.7/site-packages/penelope/__main__.py", line 84, in main
    dictionary = read_dictionary(arguments)
  File "/Library/Python/2.7/site-packages/penelope/dictionary.py", line 80, in read_dictionary
    return penelope.format_stardict.read(dictionary, args, input_file_paths)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 259, in read
    result = read_single_file(dictionary, args, input_file_path)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 187, in read_single_file
    ifo_dict = read_ifo(extracted_files["d.ifo"], has_syn, args)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 100, in read_ifo
    ifo_unicode = ifo_bytes.decode("utf-8")     # unicode, always utf-8 by spec
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 213: invalid start byte
the stardict in question is from http://thefictionary.net/steven-erikson/ I'm using version 14 from this site.

A quick google indicates this may be an issue with an asci character, but I'm in the dark as to how to resolve it.

Any help would be appreciated, I want to learn how to do this since the dictionary in question will be updated at later dates as more books in the Malazan series are released.
MGlitch is offline   Reply With Quote
Old 07-10-2017, 03:40 AM   #2
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 3,469
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
Maybe it helps if you execute before calling penelope:
Code:
export PYTHONIOENCODING=utf-8
tshering is offline   Reply With Quote
Advert
Old 07-10-2017, 09:46 AM   #3
MGlitch
Wizard
MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.
 
Posts: 1,657
Karma: 7750996
Join Date: Aug 2014
Device: Kobo Forma
putting that into terminal, then calling penelope did not resolve the issue, I get the same error.

Just to be clear I entered it as follows

export PYTHONIOENCODING=utf-8
(ran that, terminal did not appear to do anything)
penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o
(this generates the same dump I posted above)
MGlitch is offline   Reply With Quote
Old 07-10-2017, 04:56 PM   #4
tshering
Wizard
tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.tshering ought to be getting tired of karma fortunes by now.
 
Posts: 3,469
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
It would be better someone else than me who actually has some knowledge about python would try to help.
The problem seems to be that penelope expects that the input file is utf-8 encoded and your input dictionary has some other encoding.
On GitHub page of penelope, I see that there is an optional argument to set the encoding of the input file:
Code:
--input-file-encoding INPUT_FILE_ENCODING
                        use the specified encoding for reading the raw
                        contents of input file(s) (default: 'utf-8')
Maybe you can figure out which encoding your input dictionary has. Maybe it is latin-1.

Last edited by tshering; 07-10-2017 at 05:10 PM.
tshering is offline   Reply With Quote
Old 07-10-2017, 07:04 PM   #5
MGlitch
Wizard
MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.
 
Posts: 1,657
Karma: 7750996
Join Date: Aug 2014
Device: Kobo Forma
Even using text edit to make sure they're encoded at utf-8 this doesn't work. I'd found some command which would give the encoding, but it gave unknown-8 for the files. So I opened each in text edit, made a copy, saved using utf-8 to encode, and recompressed just using the new files. I still get an error, though it's slightly different.

unicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

this may be a lost cause, I don't know who actually made the stardict nor what tools they used to do so. There is a .mobi version, though I don't think that does me any good either.

Frustrating since with a series like Malazan it would be really nice to have a dictionary of terms and names.
MGlitch is offline   Reply With Quote
Advert
Old 07-11-2017, 05:45 AM   #6
embryo
Evangelist
embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.embryo knows the way to San Jose.
 
embryo's Avatar
 
Posts: 419
Karma: 54558
Join Date: Aug 2013
Device: Kobo Glo
Quote:
Originally Posted by MGlitch View Post
unicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte
This error means that it can not decode the first character of the file.
It might be that this character is the BOM character that some programs put in the begining of a UTF-8 file.
I've made a utility some time ago to convert between ANSI and UTF-8 encodings. I could choose to add or not the BOM character. See if it can help you. It must be in the Software section of the www.noembryo.com and its called Subber (it was made for subtitles...)
The direct link is http://www.noembryo.com/apps.php?subber

Last edited by embryo; 07-11-2017 at 05:47 AM.
embryo is offline   Reply With Quote
Old 07-12-2017, 04:54 PM   #7
MGlitch
Wizard
MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.MGlitch ought to be getting tired of karma fortunes by now.
 
Posts: 1,657
Karma: 7750996
Join Date: Aug 2014
Device: Kobo Forma
I don't have a windows machine currently, so that wont work. I may look into this again in a bit, possibly this weekend.
MGlitch is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Please help with converting dicts with Penelope ? khahoon Workshop 18 04-22-2019 12:04 AM
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 crutledge Kindle Books 0 05-25-2014 09:42 AM
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 crutledge ePub Books 0 05-25-2014 09:41 AM
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 crutledge BBeB/LRF Books 0 05-25-2014 09:39 AM
Free Book (Kindle) - Penelope & Prince Charming koland Deals and Resources (No Self-Promotion or Affiliate Links) 4 09-17-2010 09:21 PM


All times are GMT -4. The time now is 11:12 PM.


MobileRead.com is a privately owned, operated and funded community.