07-09-2017, 09:11 PM | #1 |
Wizard
Posts: 2,841
Karma: 22003124
Join Date: Aug 2014
Device: Kobo Forma, Kobo Sage, Kobo Libra 2
|
help with penelope OS X
I'm trying to convert a stardict to kobo dict, for what should be obvious reasons. I've got penelope installed, I have marisa installed. I have penelope 'working' in that I can type penelope and then a various command and it does something, i.e. penelope -h does generate a list of commands I can use.
However when I try and convert a stardict to kobo i get Code:
penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o mk-it [INFO] Reading input file(s)... Traceback (most recent call last): File "/usr/local/bin/penelope", line 27, in <module> main() File "/usr/local/bin/penelope", line 23, in main package_main() File "/Library/Python/2.7/site-packages/penelope/__main__.py", line 84, in main dictionary = read_dictionary(arguments) File "/Library/Python/2.7/site-packages/penelope/dictionary.py", line 80, in read_dictionary return penelope.format_stardict.read(dictionary, args, input_file_paths) File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 259, in read result = read_single_file(dictionary, args, input_file_path) File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 187, in read_single_file ifo_dict = read_ifo(extracted_files["d.ifo"], has_syn, args) File "/Library/Python/2.7/site-packages/penelope/format_stardict.py", line 100, in read_ifo ifo_unicode = ifo_bytes.decode("utf-8") # unicode, always utf-8 by spec File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 213: invalid start byte A quick google indicates this may be an issue with an asci character, but I'm in the dark as to how to resolve it. Any help would be appreciated, I want to learn how to do this since the dictionary in question will be updated at later dates as more books in the Malazan series are released. |
07-10-2017, 03:40 AM | #2 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
Maybe it helps if you execute before calling penelope:
Code:
export PYTHONIOENCODING=utf-8 |
Advert | |
|
07-10-2017, 09:46 AM | #3 |
Wizard
Posts: 2,841
Karma: 22003124
Join Date: Aug 2014
Device: Kobo Forma, Kobo Sage, Kobo Libra 2
|
putting that into terminal, then calling penelope did not resolve the issue, I get the same error.
Just to be clear I entered it as follows export PYTHONIOENCODING=utf-8 (ran that, terminal did not appear to do anything) penelope -i /Users/Glitch/Downloads/MalazanDict.zip -j stardict -f en -t en -p kobo -o (this generates the same dump I posted above) |
07-10-2017, 04:56 PM | #4 |
Wizard
Posts: 3,489
Karma: 2914715
Join Date: Jun 2012
Device: kobo touch
|
It would be better someone else than me who actually has some knowledge about python would try to help.
The problem seems to be that penelope expects that the input file is utf-8 encoded and your input dictionary has some other encoding. On GitHub page of penelope, I see that there is an optional argument to set the encoding of the input file: Code:
--input-file-encoding INPUT_FILE_ENCODING use the specified encoding for reading the raw contents of input file(s) (default: 'utf-8') Last edited by tshering; 07-10-2017 at 05:10 PM. |
07-10-2017, 07:04 PM | #5 |
Wizard
Posts: 2,841
Karma: 22003124
Join Date: Aug 2014
Device: Kobo Forma, Kobo Sage, Kobo Libra 2
|
Even using text edit to make sure they're encoded at utf-8 this doesn't work. I'd found some command which would give the encoding, but it gave unknown-8 for the files. So I opened each in text edit, made a copy, saved using utf-8 to encode, and recompressed just using the new files. I still get an error, though it's slightly different.
unicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte this may be a lost cause, I don't know who actually made the stardict nor what tools they used to do so. There is a .mobi version, though I don't think that does me any good either. Frustrating since with a series like Malazan it would be really nice to have a dictionary of terms and names. |
Advert | |
|
07-11-2017, 05:45 AM | #6 | |
Fanatic
Posts: 509
Karma: 60774
Join Date: Aug 2013
Device: Kobo Glo, GloHD
|
Quote:
It might be that this character is the BOM character that some programs put in the begining of a UTF-8 file. I've made a utility some time ago to convert between ANSI and UTF-8 encodings. I could choose to add or not the BOM character. See if it can help you. It must be in the Software section of the www.noembryo.com and its called Subber (it was made for subtitles...) The direct link is http://www.noembryo.com/apps.php?subber Last edited by embryo; 07-11-2017 at 05:47 AM. |
|
07-12-2017, 04:54 PM | #7 |
Wizard
Posts: 2,841
Karma: 22003124
Join Date: Aug 2014
Device: Kobo Forma, Kobo Sage, Kobo Libra 2
|
I don't have a windows machine currently, so that wont work. I may look into this again in a bit, possibly this weekend.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Please help with converting dicts with Penelope ? | khahoon | Workshop | 18 | 04-22-2019 12:04 AM |
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | Kindle Books | 0 | 05-25-2014 09:42 AM |
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | ePub Books | 0 | 05-25-2014 09:41 AM |
Other Fiction Walton, Amy: Penelope and the Others. v1. 25 May 2014 | crutledge | BBeB/LRF Books | 0 | 05-25-2014 09:39 AM |
Free Book (Kindle) - Penelope & Prince Charming | koland | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 09-17-2010 09:21 PM |