|  07-09-2017, 09:11 PM | #1 | 
| Wizard            Posts: 2,857 Karma: 22003124 Join Date: Aug 2014 Device: Kobo Forma, Kobo Sage, Kobo Libra 2 | 
				
				help with penelope OS X
			 
			
			I'm trying to convert a stardict to kobo dict, for what should be obvious reasons.  I've got penelope installed, I have marisa installed.  I have penelope 'working' in that I can type penelope and then a various command and it does something, i.e. penelope -h does generate a list of commands I can use. However when I try and convert a stardict to kobo i get Code: penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o mk-it
[INFO] Reading input file(s)...
Traceback (most recent call last):
  File "/usr/local/bin/penelope", line 27, in <module>
    main()
  File "/usr/local/bin/penelope", line 23, in main
    package_main()
  File "/Library/Python/2.7/site-packages/penelope/__main__.py", line 84, in main
    dictionary = read_dictionary(arguments)
  File "/Library/Python/2.7/site-packages/penelope/dictionary.py", line 80, in read_dictionary
    return penelope.format_stardict.read(dictionary, args, input_file_paths)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 259, in read
    result = read_single_file(dictionary, args, input_file_path)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 187, in read_single_file
    ifo_dict = read_ifo(extracted_files["d.ifo"], has_syn, args)
  File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 100, in read_ifo
    ifo_unicode = ifo_bytes.decode("utf-8")     # unicode, always utf-8 by spec
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 213: invalid start byteA quick google indicates this may be an issue with an asci character, but I'm in the dark as to how to resolve it. Any help would be appreciated, I want to learn how to do this since the dictionary in question will be updated at later dates as more books in the Malazan series are released. | 
|   |   | 
|  07-10-2017, 03:40 AM | #2 | 
| Wizard            Posts: 3,489 Karma: 2914715 Join Date: Jun 2012 Device: kobo touch | 
			
			Maybe it helps if you execute before calling penelope: Code: export PYTHONIOENCODING=utf-8 | 
|   |   | 
|  07-10-2017, 09:46 AM | #3 | 
| Wizard            Posts: 2,857 Karma: 22003124 Join Date: Aug 2014 Device: Kobo Forma, Kobo Sage, Kobo Libra 2 | 
			
			putting that into terminal, then calling penelope did not resolve the issue, I get the same error.   Just to be clear I entered it as follows export PYTHONIOENCODING=utf-8 (ran that, terminal did not appear to do anything) penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o (this generates the same dump I posted above) | 
|   |   | 
|  07-10-2017, 04:56 PM | #4 | 
| Wizard            Posts: 3,489 Karma: 2914715 Join Date: Jun 2012 Device: kobo touch | 
			
			It would be better someone else than me who actually has some knowledge about python would try to help. The problem seems to be that penelope expects that the input file is utf-8 encoded and your input dictionary has some other encoding. On GitHub page of penelope, I see that there is an optional argument to set the encoding of the input file: Code: --input-file-encoding INPUT_FILE_ENCODING
                        use the specified encoding for reading the raw
                        contents of input file(s) (default: 'utf-8')Last edited by tshering; 07-10-2017 at 05:10 PM. | 
|   |   | 
|  07-10-2017, 07:04 PM | #5 | 
| Wizard            Posts: 2,857 Karma: 22003124 Join Date: Aug 2014 Device: Kobo Forma, Kobo Sage, Kobo Libra 2 | 
			
			Even using text edit to make sure they're encoded at utf-8 this doesn't work.  I'd found some command which would give the encoding, but it gave unknown-8 for the files.  So I opened each in text edit, made a copy, saved using utf-8 to encode, and recompressed just using the new files.  I still get an error, though it's slightly different. unicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte this may be a lost cause, I don't know who actually made the stardict nor what tools they used to do so. There is a .mobi version, though I don't think that does me any good either. Frustrating since with a series like Malazan it would be really nice to have a dictionary of terms and names. | 
|   |   | 
|  07-11-2017, 05:45 AM | #6 | |
| Fanatic            Posts: 529 Karma: 64554 Join Date: Aug 2013 Device: Kobo Glo, GloHD | Quote: 
 It might be that this character is the BOM character that some programs put in the begining of a UTF-8 file. I've made a utility some time ago to convert between ANSI and UTF-8 encodings. I could choose to add or not the BOM character. See if it can help you. It must be in the Software section of the www.noembryo.com and its called Subber (it was made for subtitles...) The direct link is http://www.noembryo.com/apps.php?subber Last edited by embryo; 07-11-2017 at 05:47 AM. | |
|   |   | 
|  07-12-2017, 04:54 PM | #7 | 
| Wizard            Posts: 2,857 Karma: 22003124 Join Date: Aug 2014 Device: Kobo Forma, Kobo Sage, Kobo Libra 2 | 
			
			I don't have a windows machine currently, so that wont work.  I may look into this again in a bit, possibly this weekend.
		 | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Please help with converting dicts with Penelope ? | khahoon | Workshop | 18 | 04-22-2019 12:04 AM | 
| Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | Kindle Books | 0 | 05-25-2014 09:42 AM | 
| Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | ePub Books | 0 | 05-25-2014 09:41 AM | 
| Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | BBeB/LRF Books | 0 | 05-25-2014 09:39 AM | 
| Free Book (Kindle) - Penelope & Prince Charming | koland | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 09-17-2010 09:21 PM |