10-04-2014, 04:28 PM | #1006 | |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I hadn't realized you had any python programming/porting experience. |
|
10-04-2014, 04:48 PM | #1007 |
Resident Curmudgeon
Posts: 74,027
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I don't program in Python, but theses decisions (if they are made) do affect me and every user of Sigil. If Python 3 is chosen and it makes it harder for those writing plugins to write them, then that's not a good choice. I'm just getting in my opinion in case these decisions are made so hopefully they will be made on the side of Python 2 which (IMHO) will be a lot easier for more people to program than Python 3.
|
Advert | |
|
10-04-2014, 05:26 PM | #1008 | |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Sorry but we (user-none, DiapDealer and I) have discussed this and we disagree completely with you. We don't have the huge python2 codebase to worry about like calibre does, and serious bugs in python 2 are simply not being fixed. So if we include python 3 into Sigil but write code that works on both, then we really have the best of both worlds. We can allow an external Python 2 interpreter to be used with Sigil and still bundle Python 3 internally with Sigil. And fwiw, most plugins are not as extensive as KindleUnpack, so porting them to work on both Python 2 and 3 will not be as hard. It is not like we are mandating code that only works on Python 3 or dropping support for Python 2. KevinH Quote:
|
|
10-04-2014, 05:51 PM | #1009 | |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
Yes I read that PEP about variable size storage methods for the new strings and looked at their data structures for storing strings under the new formats. It looks like an engineers nightmare. They have fields for latin-1, utf-8, utf-16, and ucs-4 all stored in two different string structures depending on the size of the largest character, and they use bitfields to store info, etc. And their interface routine decisions are a joke. According to some e-mails, string manipulation slowed down by over 30% whereas storage really wasn't much better. I read an article that said that after simple compression, utf-8 (even for non-BMP) takes up less space than ucs-4 due to the degree of byte repetition for non-BMP languages. So they could have stuck with utf-8 for Linux/unix and used utf-16 le with multiple chars used to encode ucs-4 when needed for Windows. Or even moved all platforms to utf-16. Imagine debugging a buggy program with gdb. You would need gdb macros to just figure out what the string data said!! I must say that I am very unimpressed with many of the developer decisions. But they don't seem to see how silly they are being and how many future bugs and nightmares they are creating with such nonsense. They have forgotten the KISS principle of all good engineers. As I said before, if a large organization said they would fork Python 2 and fix the many longstanding bugs and make their own new releases, I would stay with Python 2 and forget Python 3. As it stands I can just hedge my bets, by supporting both with one codebase and seeing what happens. Take care, KevinH Quote:
Last edited by KevinH; 10-04-2014 at 09:13 PM. |
|
10-05-2014, 09:40 AM | #1010 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
I have tested a few ebooks and got errors with the experimental code. Yes, it has (I think a lot) bugs. The experimental environment is as follows: python versions are 2.7.6 and 3.3.4.1 for windows 32bit. The codepage of the Windows is cp932. PYTHONIOENCODING=utf-8 is set. I have got following errors: 1. HDimage_test.mobi (an epub3 fixed layout ebook which I posted before) Successfully unpacked with python 2; but with python 3, got an error message: Spoiler:
2. test2.awz3 (an epub2 reflowable ebook in English with several images) Got errors with the both versions. with python 2: Spoiler:
with python 3: Spoiler:
3. kokoro.mobi (an epub3 rtl reflowable ebook in Japanese) Unpacked as an epub2 ebook instead of the epub3 with the both versions. I will see the code and debug if possible after tomorrow. Take care, Last edited by tkeo; 10-05-2014 at 09:49 AM. |
Advert | |
|
10-05-2014, 11:32 AM | #1011 | |||
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Thanks for testing this .... Quote:
- no use of % to fold ascii or utf-8 strings into binary data (there is a pep on this) - issues with iterating bytes and extracting single bytes from bytestrings, and there is a pep on this as well (pep 467) but nothing definite yet But I have now fixed this. Quote:
Quote:
I have fixes for error 1 in the tree and I will track down and fix the huffman/cdic code with my own testcase. I will post an updated version once I have both errors fixed. Please keep trying them on as many test cases as you have so that we can exercise all of the code and track down these last issues. Thanks, Kevin |
|||
10-05-2014, 11:45 AM | #1012 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Here is a new nlib.zip that should fix errors 1 and 2. If you have a testcase for error 3, I would be happy to track it down as well. Thanks, Kevin Last edited by KevinH; 10-07-2014 at 12:27 PM. Reason: remove old attachment that has been replaced with a better version |
10-06-2014, 07:11 AM | #1013 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
I have tested the new one. Both 1. HDimage_test.mobi and 2. test2.azw3 are successfully unpacked; however, warnings are gotten on python 3: Warning: There are unprocessed index bytes left: b'0000' Warning: There are unprocessed index bytes left: b'000000' Warning: There are unprocessed index bytes left: b'0000' But, 3. the rtl reflowable ebook in Japanese is still unpacked as an epub2. I attach the mobi. And I have tested other files and get errors. 4. test1.azw3 (a fixed layout rtl eBook) python 2: Spoiler:
python 3: Spoiler:
5. test4.azw3 (an epub2 eBook which has several images in English) Successfully unpacked with python 2. With python3: Spoiler:
Take care, Last edited by tkeo; 10-06-2014 at 09:26 AM. |
10-06-2014, 11:58 AM | #1014 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Yes, both of these are the folding "can't use % with bytestrings issues". I can fix both of those easily tonight after work. I will also track down why it is only unpacking things to epub 2 and fix that. Thanks for the test cases! And thank you for testing! Once we run out of bugs to fix, I will start with dictionaries and try to debug that as well. Take care, KevinH |
10-07-2014, 12:26 PM | #1015 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Okay here is an updated version of nlib.zip that should take care of the need to use PYTHONIOENCODING, properly handles kokoro.mobi, and has attempts to fix your other issues although I do not have test1.aw or test4.azw to confirm it works. Please keep testing it to exercise more code and see if you can break it. I will do the same from my end. Thanks again for all of your help. KevinH Last edited by KevinH; 10-07-2014 at 04:22 PM. Reason: removed outdated nlib.zip |
10-07-2014, 03:42 PM | #1016 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
I have fixed a few more bugs when working with dictionaries, printing unknown bytes, correcting the warnings about missing bytes in the index, etc. So please give this version a try. I will delete the older version above. Thanks, KevinH Last edited by KevinH; 10-07-2014 at 04:22 PM. Reason: remove outdated nlib.zip |
10-07-2014, 04:21 PM | #1017 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Still more fixes. Sorry but the number of potential code paths is so large and the number of minor changes needed is quite large. I have now tested with and passed all of my test cases and all of my dictionaries. Hope this one only has a few bugs remaining! KevinH Last edited by KevinH; 10-08-2014 at 11:55 AM. Reason: remove outdated nlib.zip attachment |
10-08-2014, 08:50 AM | #1018 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
I have modified the code; however, I am not sure that they are proper way to fix. I attach the patch. (Due to the setting of my IDE many, differences of spaces at the end of lines are included.) I have not tested ehough; but I see a few bugs still. 1. krzyzacy-tom-pierwszy.mobi (some one posted here) got an error with python 3: Spoiler:
2. HDimage_test.mobi With python 3, content.opf is not properly constructed. I attach the diff of content.opf v0.75 vs v0.80 on python 3. EDIT It is correctly constructed after reversed mobi_opf.py. I remove the diff. 3. test1.azw3 With python3, the cover page is inserted incorrectly. Spoiler:
Take care, EDIT I have replaced the patch file. Last edited by tkeo; 10-08-2014 at 09:41 AM. Reason: replaced the attached file |
10-08-2014, 10:50 AM | #1019 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Issue 1 fixed - it was using % to fold in data into bytes which doesn't work in python 3.4 and earlier - but will in 3.5. I made a few change in mobi_html.py to hopefully fix other cases like this. Issue 3 fixed - it happened because in kindleunpack.py in the cover image code part.find was searching a bytes string for a unicode cover image name and could never find it! I have now fixed that as well. But I could not see any issue with HDimage_test.mobi at all. I ran it with the 0.75 code, then diffed it against the new nlib code for both python 2 and python 3 and the only differences I could see were due to the order of xml attributes which is random when extracted from dicts. Exactly what problem are you having? Is this what your patch was for? If so, can you re-post your patch. I can not unzip it on a Mac or my Linux box at all. Here is what it says when I try. It seems to need some sort of later PK modified Zip routine: KevinsiMac:Desktop kbhend$ unzip nlib.patch.zip Archive: nlib.patch.zip skipping: nlib.patch.txt need PK compat. v6.3 (can do v2.1) Thanks, KevinH ps. I have attached the very latest version of nlib.zip Last edited by KevinH; 10-08-2014 at 06:43 PM. Reason: remove outdated attachment |
10-08-2014, 12:56 PM | #1020 |
Sigil Developer
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
On sourceforge I found a p7zip command line program and built it on Mac OS X and then was able to unpack your nlib.patch.zip. Many of the changes you had were similar to what I had, and others fixed real porting bugs. But there are a few things I can not integrate from your patch: 1. in compatibility_utils.py, under Windows, you can not use the following code, as it will break almost on Windows machines that do not use the cp you use. + + # Conversion of argv to unicode without using cdll.kernel32.GetCommandLineW + FILE_SYSTEM_ENCODING = sys.getfilesystemencoding() + uargv = [] + in_double_quotations = False + for arg in sys.argv: + if not isinstance(arg, unicode): + arg = arg.decode(FILE_SYSTEM_ENCODING) + if in_double_quotations: + if arg[-1] == u'"': + in_double_quotations = False + arg = arg[:-1] + uargv[-1] = uargv[-1] + u' ' + arg + else: + if arg[0] == u'"': + arg = arg[1:] + if arg[-1] == u'"': + arg = arg[:-1] + else: + in_double_quotations = True + uargv.append(arg) + return uargv + It is much better to use the kernel32 call, and kindleunpack.py has always used this routine (see utf8_utils.py for utf8_argv) in one form or another. It is needed for full unicode (at least utf-16) compliance on all Windows machines under python 2. 2. The code below should only be used as a last resort. +if sys.version_info[0] == 2: + reload(sys) + sys.setdefaultencoding('utf-8') It hides the automatic upconversions that happen when mixing unicode and bytestrings. Those are exactly the things we need to find and track down and fix. These often happen when printing. The early change I made should have taken care of that. If not, I need a traceback. Luckily python 3 always barfs when trying to mix bytes and unicode so we should be able to find and fix those. We want stdout to be utf-8 encoded so it works on all machines and not just on one specific windows cp. Please send me the traceback of the problem you meant this to fix. 3. And finally: @@ -333,7 +333,7 @@ flowpart = flows[num] if fmt == b'inline': tag = flowpart - else: + elif pdir is not None and fnm is not None: replacement = b'"../' + utf8_str(pdir) + b'/' + utf8_str(fnm) + b'"' tag = flow_pattern.sub(replacement, tag, 1) self.used[fnm] = 'used' Why are these extra conditions needed? Are then needed in kindleunpack v075 as well? If not, this code is probably just hiding some other underlying problem we need to track down. Thanks, KevinH ps. Is there any way you can get you IDE to stop caring about extra whitespace at the end of lines? Either way I'll run everything through reindent.py to clean up any extra whitspace in the source. Last edited by KevinH; 10-08-2014 at 06:23 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |