12-31-2012, 08:48 PM | #466 |
Sigil Developer
Posts: 7,727
Karma: 5444398
Join Date: Nov 2009
Device: many
|
Hi Sergey,
If you are not seeing the correct characters in the Log window when running the GUI, please try replacing the following class in Mobi_Unpack.pyw with the following: Code:
# Wrap a stream so that output gets appended to shared queue # using utf-8 encoding class QueuedStream: def __init__(self, stream, q): self.stream = stream self.encoding = stream.encoding self.q = q if self.encoding == None: self.encoding = 'utf-8' def write(self, data): if isinstance(data,unicode): data = data.encode('utf-8',"replace") elif self.encoding != 'utf-8': udata = data.decode(self.encoding) data = udata.encode('utf-8', "replace") self.q.put(data) def __getattr__(self, attr): return getattr(self.stream, attr) This should decode the stdout from the mobi_unpack.py (which will be in your local Russian code page) and encode it into utf-8 so that it should get written properly to the Log window (hopefully). Please let me know if this helps. Thanks, KevinH |
01-01-2013, 07:19 PM | #467 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2012
Device: Kindle
|
To: DiapDealer about quoteattr().
quoteattr() doesn't change " to ' and back in attribute value if you mean this. If attribute value doesn't have " quoteattr() would put it into "" without additional encoding. The same with '. If both ' and " are present in the value quoteattr() would replace " to " and use " around. If you wish for some reason always put attribute values into "" you can escape ". There is no need to escape ' in this case. Last edited by Sergey Dubinets; 01-01-2013 at 07:23 PM. |
01-01-2013, 07:39 PM | #468 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2012
Device: Kindle
|
To DiapDealer about double unescaping.
It is not that innocent as it can appear. Of course if value doesn't contain any '&' additional unsnapping would not do any harm. The problem happens when unescaped value contains known entity. For example is title of the article is "Don't double unescape & in metadata". Escaped string would be "Don't double unescape & in metadata". If you unescape it twice or unescape original string you would get "Don't double unescape & in metadata" and this is not what original title was. In short: double unescaping is a bug and it results in "data loss". |
01-01-2013, 10:10 PM | #469 | |
Grand Sorcerer
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Code:
xmlescape(self.h.unescape(value) And makes it: "Don't double unescape & in metadata". Then saxutils.escape() makes it: "Don't double unescape & in metadata". No data loss. And you can't create a mobi with kindlegen that preserves and displays the literal text "&" in the title anyway. Your example is a perfect illustration of why I've chosen to do it the way I have. Without HTMLParser's initial unescape(), using the saxutils escape() method alone (which is required to handle any html tags or unescaped ampersands) would result in a valid "&" being turned into "&amp;". Just like you described. The current method will preserve all pre-existing < > and & entities while converting any other entities encountered to their character representations and properly escaping any html tags and naked ampersands. Last edited by DiapDealer; 01-01-2013 at 10:32 PM. |
|
01-02-2013, 12:32 AM | #470 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2012
Device: Kindle
|
If value in the mobi file is html escaped you need to unescape it using HTML rules for processing and then escape it according XML rules when writing to XML file. As you do.
My statement was: you can't unescape "just in case" (because no harm was done.) If metadata has escaped strings we have to unescape them. If it has none-escaped strings we shouldn't do this. |
01-02-2013, 03:07 AM | #471 | |
The Grand Mouse 高貴的老鼠
Posts: 71,618
Karma: 306652114
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
All we can do is pick the least bad option. Which, IMO, is to unescape the text. It is far, far more common that the text has been escaped than that the text is unescaped but with some apparently escaped entities. |
|
01-02-2013, 09:17 PM | #472 |
Sigil Developer
Posts: 7,727
Karma: 5444398
Join Date: Nov 2009
Device: many
|
Mobi_Unpack_experimental V2
Hi,
Here is a new version of Mobi_Unpack (experimental v2) which has all known bug fixes in place. Its primary new feature is a more robust GUI interface (Mobi_Unpack.pyw) that should better support international users on Windows with improved full unicode support for all file paths and file names. This should be considered beta level software. I would really appreciate hearing back about any successes or failures. If it works well this version should become Mobi_Unpack_v061 final. Thanks, KevinH |
01-07-2013, 02:17 PM | #473 |
Member
Posts: 24
Karma: 10
Join Date: Jan 2013
Device: Kobo Glo
|
Hello,
I'm new to the forum and I've found that wonderful script(s) but I can't find out how to launch the mobi_dict.py script. I've seen no specific option in the GUI and launching it using python(with or without arguments) has no effect: Code:
python mobi_dict.py <in> <out> Thanks, Loceka. |
01-07-2013, 02:48 PM | #474 |
Grand Sorcerer
Posts: 5,607
Karma: 23165369
Join Date: Dec 2010
Device: Kindle PW2
|
AFAIK, mobi_dict.py is a module that is automatically called by the main .pyw if a dictionary .mobi file is detected.
I.e. if you want to decompile a dictionary simply execute Mobi_Unpack.pyw. It'll automatically execute mobi_dict.py with the correct parameters. |
01-07-2013, 02:55 PM | #475 | |
Grand Sorcerer
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
01-08-2013, 02:31 PM | #476 |
Member
Posts: 24
Karma: 10
Join Date: Jan 2013
Device: Kobo Glo
|
Thank you both for your answers.
I was mistaken by the file name and thought it was meant to convert Mobi dictionaries to the DICT format, my bad. The dictionnaries I tried to extract where mostly successfully extracted despite the errors in the logs : Code:
Error: Dictionary contains multiple inflection index sections, which is not yet supported Error: Dictionary uses obsolete inflection rule scheme which is not yet supported |
01-09-2013, 05:05 PM | #477 |
Member
Posts: 24
Karma: 10
Join Date: Jan 2013
Device: Kobo Glo
|
Well thank you all again for those scripts.
I've made one of my own (in Perl) that converts a mobi dictionary into a Kobo format dictionary. Actually it may not be really useful because the ones I tried did not match the default Kobo dictionaries, but still it worked for me. As for the script itself it must be launched as : Code:
perl mobi2kobo.pl -i <input file> -o <output dir> It also requires some necessary third-party programs :
|
01-15-2013, 09:46 AM | #478 |
Connoisseur
Posts: 86
Karma: 470352
Join Date: Dec 2012
Device: Kindle Fire, IPad
|
Thx to all!
Hit |
01-17-2013, 03:05 AM | #479 |
The Grand Mouse 高貴的老鼠
Posts: 71,618
Karma: 306652114
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Version 0.61 has now been uploaded to the first post of the thread.
This includes all recent fixes for the scripts, and now should fully support the use of unicode file names, thanks to lots of work by KevinH. With version 0.61 the name of the scipt has been changed to KindleUnpack, since almost all Mobipocket files are now actually Kindle files from Amazon, and the script certainly handles files that are not Mobipocket at all (KF8 and .azw4). |
01-17-2013, 07:51 PM | #480 |
Resident Curmudgeon
Posts: 74,576
Karma: 129670952
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Now we just need the Calibre plugin updated.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |