Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 10-04-2014, 04:28 PM   #1006
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by JSWolf View Post
To be honest, it's best to Bundle Python 2 and forget Python 3 exists.
To be honest, I think it's best if you find something else to worry about (or at least wait until any of these decisions actually affect you in the slightest).

I hadn't realized you had any python programming/porting experience.
DiapDealer is offline   Reply With Quote
Old 10-04-2014, 04:48 PM   #1007
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,027
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
To be honest, I think it's best if you find something else to worry about (or at least wait until any of these decisions actually affect you in the slightest).

I hadn't realized you had any python programming/porting experience.
I don't program in Python, but theses decisions (if they are made) do affect me and every user of Sigil. If Python 3 is chosen and it makes it harder for those writing plugins to write them, then that's not a good choice. I'm just getting in my opinion in case these decisions are made so hopefully they will be made on the side of Python 2 which (IMHO) will be a lot easier for more people to program than Python 3.
JSWolf is offline   Reply With Quote
Advert
Old 10-04-2014, 05:26 PM   #1008
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,
Sorry but we (user-none, DiapDealer and I) have discussed this and we disagree completely with you. We don't have the huge python2 codebase to worry about like calibre does, and serious bugs in python 2 are simply not being fixed. So if we include python 3 into Sigil but write code that works on both, then we really have the best of both worlds. We can allow an external Python 2 interpreter to be used with Sigil and still bundle Python 3 internally with Sigil.

And fwiw, most plugins are not as extensive as KindleUnpack, so porting them to work on both Python 2 and 3 will not be as hard. It is not like we are mandating code that only works on Python 3 or dropping support for Python 2.

KevinH

Quote:
Originally Posted by JSWolf View Post
I would think it would be wiser for Sigil to bundle Python 2 since there is a lot more code out there in Python 2 then Python 3. Many people dislike Python 3 and are sticking to Python 2. Plus, porting over Python 2 code is easier then porting over Python 2 code to run on Python 3. Add to that the fact that people who program in Python 2 would then have a learning curve moving to Python 3.

To be honest, it's best to Bundle Python 2 and forget Python 3 exists.
KevinH is online now   Reply With Quote
Old 10-04-2014, 05:51 PM   #1009
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Kovid,

Yes I read that PEP about variable size storage methods for the new strings and looked at their data structures for storing strings under the new formats. It looks like an engineers nightmare. They have fields for latin-1, utf-8, utf-16, and ucs-4 all stored in two different string structures depending on the size of the largest character, and they use bitfields to store info, etc.

And their interface routine decisions are a joke. According to some e-mails, string manipulation slowed down by over 30% whereas storage really wasn't much better. I read an article that said that after simple compression, utf-8 (even for non-BMP) takes up less space than ucs-4 due to the degree of byte repetition for non-BMP languages. So they could have stuck with utf-8 for Linux/unix and used utf-16 le with multiple chars used to encode ucs-4 when needed for Windows. Or even moved all platforms to utf-16. Imagine debugging a buggy program with gdb. You would need gdb macros to just figure out what the string data said!!

I must say that I am very unimpressed with many of the developer decisions. But they don't seem to see how silly they are being and how many future bugs and nightmares they are creating with such nonsense. They have forgotten the KISS principle of all good engineers.

As I said before, if a large organization said they would fork Python 2 and fix the many longstanding bugs and make their own new releases, I would stay with Python 2 and forget Python 3. As it stands I can just hedge my bets, by supporting both with one codebase and seeing what happens.

Take care,

KevinH

Quote:
Originally Posted by kovidgoyal View Post
@KevinH: I see you've started discovering the joys of Python 3 Be glad you dont have to port any C extension modules. In Python 2 strings are internally always UTF-16 (except on linux) which is great because all external libraries (the windows API, ICU, etc.) all use UTF-16. As of python 3.3 a python string can be any of ascii, UCS2 or UCS4, depending on its contents. So now every time you call any external API function with a python string, you have to inspect and convert it. Joy, joy, joy.

And if you thought that dealing with binary file formats was bad, think about all the network facing code -- all network protocols are binary. I really dont know what the python 3 devs were smoking. Thank heavens python is open source and I can continue using python 2 for a long, long time. Hopefully, I can retire before it becomes necessary to port calibre from python 2.

Last edited by KevinH; 10-04-2014 at 09:13 PM.
KevinH is online now   Reply With Quote
Old 10-05-2014, 09:40 AM   #1010
tkeo
Connoisseur
tkeo began at the beginning.
 
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
Hi Kevin,

I have tested a few ebooks and got errors with the experimental code. Yes, it has (I think a lot) bugs.

The experimental environment is as follows:
python versions are 2.7.6 and 3.3.4.1 for windows 32bit.
The codepage of the Windows is cp932.
PYTHONIOENCODING=utf-8 is set.

I have got following errors:

1. HDimage_test.mobi (an epub3 fixed layout ebook which I posted before)

Successfully unpacked with python 2; but with python 3, got an error message:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 38 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 5 section of book...
Mobi Version: 5
Codec: utf-8
Title: b'HD Content test'
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image00003.jpeg from section 3
Extracting image: image00004.jpeg from section 4
Extracting image: image00005.jpeg from section 5
Extracting image: image00006.jpeg from section 6
Extracting image: image00007.jpeg from section 7
Extracting image: cover00008.jpeg from section 8
Extracting image: image00010.jpeg from section 10
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: b'HD Content test'
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting HD image: HDimage00029.jpeg from section 29
Extracting HD image: HDimage00030.jpeg from section 30
Extracting HD image: HDimage00031.jpeg from section 31
Extracting HD image: HDimage00032.jpeg from section 32
Extracting HD image: HDimage00034.jpeg from section 34
Unpacking raw markup language
Warning: There are unprocessed index bytes left: b'0000'
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 523, in processMobi8
usedmap = htmlproc.buildXHTML()
File "mobi_html.py", line 367, in bu
ildXHTML
replacement = b'%s%s%s'%(osep, b'../Images/' + imageName, csep)
TypeError: can't concat bytes to str


2. test2.awz3 (an epub2 reflowable ebook in English with several images)
Got errors with the both versions.

with python 2:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXXXXXX
EXTH Title: XXXXXXXX
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00172.jpeg from section 172
Extracting image: image00173.jpeg from section 173
Extracting image: image00174.gif from section 174
Extracting image: image00175.gif from section 175
Extracting image: image00176.jpeg from section 176
Extracting image: image00177.gif from section 177
Extracting image: image00178.gif from section 178
Extracting image: cover00179.jpeg from section 179
Extracting image: image00180.jpeg from section 180
Extracting image: image00181.jpeg from section 181
Extracting image: image00183.jpeg from section 183
Unpacking raw markup language
Error: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(
128)
Traceback (most recent call last):
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 456, in processMobi8
rawML = mh.getRawML()
File "mobi_header.py", line 785, in
getRawML
dataList.append(self.unpack(data))
File "mobi_uncompress.py", line 131,
in unpack
slice = self.unpack(slice)
File "mobi_uncompress.py", line 133,
in unpack
s += slice
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)


with python 3:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 869, in unpackBook
mh = MobiHeader(sect,0)
File "mobi_header.py", line 484, in
__init__
reader.loadCdic(self.sect.loadSection(huffoff+i))
File "mobi_uncompress.py", line 97,
in loadCdic
self.dictionary += lmap(getslice, struct.unpack_from(b'>%dH' % n, cdic, 16))

TypeError: unsupported operand type(s) for %: 'bytes' and 'int'


3. kokoro.mobi (an epub3 rtl reflowable ebook in Japanese)
Unpacked as an epub2 ebook instead of the epub3 with the both versions.


I will see the code and debug if possible after tomorrow.

Take care,

Last edited by tkeo; 10-05-2014 at 09:49 AM.
tkeo is offline   Reply With Quote
Advert
Old 10-05-2014, 11:32 AM   #1011
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

Thanks for testing this ....

Quote:
Originally Posted by tkeo View Post
1. HDimage_test.mobi (an epub3 fixed layout ebook which I posted before)

Successfully unpacked with python 2; but with python 3, got an error message:

replacement = b'%s%s%s'%(osep, b'../Images/' + imageName, csep)
TypeError: can't concat bytes to str
This was a combination of problems

- no use of % to fold ascii or utf-8 strings into binary data (there is a pep on this)

- issues with iterating bytes and extracting single bytes from bytestrings, and there is a pep on this as well (pep 467) but nothing definite yet

But I have now fixed this.

Quote:
2. test2.awz3 (an epub2 reflowable ebook in English with several images)
Got errors with the both versions.

with python 2:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXXXXXX
EXTH Title: XXXXXXXX
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00172.jpeg from section 172
Extracting image: image00173.jpeg from section 173
Extracting image: image00174.gif from section 174
Extracting image: image00175.gif from section 175
Extracting image: image00176.jpeg from section 176
Extracting image: image00177.gif from section 177
Extracting image: image00178.gif from section 178
Extracting image: cover00179.jpeg from section 179
Extracting image: image00180.jpeg from section 180
Extracting image: image00181.jpeg from section 181
Extracting image: image00183.jpeg from section 183
Unpacking raw markup language
Error: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(
128)
Traceback (most recent call last):
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 456, in processMobi8
rawML = mh.getRawML()
File "mobi_header.py", line 785, in
getRawML
dataList.append(self.unpack(data))
File "mobi_uncompress.py", line 131,
in unpack
slice = self.unpack(slice)
File "mobi_uncompress.py", line 133,
in unpack
s += slice
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)


with python 3:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 869, in unpackBook
mh = MobiHeader(sect,0)
File "mobi_header.py", line 484, in
__init__
reader.loadCdic(self.sect.loadSection(huffoff+i))
File "mobi_uncompress.py", line 97,
in loadCdic
self.dictionary += lmap(getslice, struct.unpack_from(b'>%dH' % n, cdic, 16))

TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
This is because I have not tried books with huffman cdic compression. I will generate a few test cases and see if I can track this down.

Quote:
3. kokoro.mobi (an epub3 rtl reflowable ebook in Japanese)
Unpacked as an epub2 ebook instead of the epub3 with the both versions.
Probably due to a comparison against a string constant where the variable be tested or the constant itself is bytestring and the variable in unicode or visa versa.


I have fixes for error 1 in the tree and I will track down and fix the huffman/cdic code with my own testcase. I will post an updated version once I have both errors fixed. Please keep trying them on as many test cases as you have so that we can exercise all of the code and track down these last issues.

Thanks,

Kevin
KevinH is online now   Reply With Quote
Old 10-05-2014, 11:45 AM   #1012
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

Here is a new nlib.zip that should fix errors 1 and 2. If you have a testcase for error 3, I would be happy to track it down as well.

Thanks,

Kevin

Last edited by KevinH; 10-07-2014 at 12:27 PM. Reason: remove old attachment that has been replaced with a better version
KevinH is online now   Reply With Quote
Old 10-06-2014, 07:11 AM   #1013
tkeo
Connoisseur
tkeo began at the beginning.
 
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
Hi Kevin,

I have tested the new one.
Both
1. HDimage_test.mobi
and
2. test2.azw3
are successfully unpacked; however, warnings are gotten on python 3:

Warning: There are unprocessed index bytes left: b'0000'
Warning: There are unprocessed index bytes left: b'000000'
Warning: There are unprocessed index bytes left: b'0000'

But, 3. the rtl reflowable ebook in Japanese is still unpacked as an epub2.
I attach the mobi.

And I have tested other files and get errors.

4. test1.azw3 (a fixed layout rtl eBook)

python 2:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 224 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXX
EXTH Title: XXXX

Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: cover00046.jpeg from section 46
.
.
.
Extracting image: image00213.jpeg from section 213
Warning: RESC section length(14486bytes) does not match its size(14434bytes).
Extracting image: image00215.jpeg from section 215
Warning: Section 218 does not contain a recognised resource
Warning: Section 219 does not contain a recognised resource
Unpacking raw markup language
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Error: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(
128)
Traceback (most recent call last):
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 523, in processMobi8
usedmap = htmlproc.buildXHTML()
File "mobi_html.py", line 343, in bu
ildXHTML
part = b"".join(srcpieces)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal
not in range(128)



python 3:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 224 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXX
EXTH Title: XXXX

Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: cover00046.jpeg from section 46
.
.
.
Extracting image: image00213.jpeg from section 213
Warning: RESC section length(14486bytes) does not match its size(14434bytes).
Extracting image: image00215.jpeg from section 215
Warning: Section 218 does not contain a recognised resource
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 817, in process_all_mobi_headers
imgnames, image_ptr = processImage(i, files, imgnames, sect, data, beg, ima
ge_ptr, cover_offset)
File "kindleunpack.py", line 387, in processImage
sect.setsectiondescription(i,"Mysterious Section, first four bytes %s" % des
cribe(data[0:4]))
File "mobi_sectioner.py", line 37, i
n describe
txtans += i
TypeError: Can't convert 'int' object to str implicitly



5. test4.azw3 (an epub2 eBook which has several images in English)

Successfully unpacked with python 2.
With python3:

Spoiler:
Unpacking Book...
Palm DB type: BOOKMOBI, 363 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXX
EXTH Title: XXXX
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00341.jpeg from section 341
Extracting image: image00342.jpeg from section 342
Extracting image: image00343.jpeg from section 343
Extracting image: image00344.jpeg from section 344
Extracting image: image00345.jpeg from section 345
Extracting image: image00346.jpeg from section 346
Extracting image: image00347.jpeg from section 347
Extracting image: image00348.jpeg from section 348
Extracting image: image00349.jpeg from section 349
Extracting image: image00350.jpeg from section 350
Extracting image: image00351.jpeg from section 351
Extracting image: image00352.jpeg from section 352
Extracting image: image00353.jpeg from section 353
Extracting image: cover00354.jpeg from section 354
Extracting image: image00356.jpeg from section 356
Unpacking raw markup language
Warning: There are unprocessed index bytes left: b'0000'
Warning: There are unprocessed index bytes left: b'0000'
Warning: There are unprocessed index bytes left: b'000000'
Processing ncx / toc
Warning: There are unprocessed index bytes left: b'00'
Building an epub-like structure
Building proper xhtml for each file
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 523, in processMobi8
usedmap = htmlproc.buildXHTML()
File "mobi_html.py", line 195, in bu
ildXHTML
lambda m:b' style="page-break-after:%s"'%m.group(1), tag)
File "mobi_html.py", line 195, in <l
ambda>
lambda m:b' style="page-break-after:%s"'%m.group(1), tag)
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'




Take care,
Attached Files
File Type: mobi kokoro.mobi (946.9 KB, 180 views)

Last edited by tkeo; 10-06-2014 at 09:26 AM.
tkeo is offline   Reply With Quote
Old 10-06-2014, 11:58 AM   #1014
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

Yes, both of these are the folding "can't use % with bytestrings issues". I can fix both of those easily tonight after work. I will also track down why it is only unpacking things to epub 2 and fix that. Thanks for the test cases! And thank you for testing! Once we run out of bugs to fix, I will start with dictionaries and try to debug that as well.

Take care,

KevinH
KevinH is online now   Reply With Quote
Old 10-07-2014, 12:26 PM   #1015
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

Okay here is an updated version of nlib.zip that should take care of the need to use PYTHONIOENCODING, properly handles kokoro.mobi, and has attempts to fix your other issues although I do not have test1.aw or test4.azw to confirm it works.

Please keep testing it to exercise more code and see if you can break it. I will do the same from my end.

Thanks again for all of your help.

KevinH

Last edited by KevinH; 10-07-2014 at 04:22 PM. Reason: removed outdated nlib.zip
KevinH is online now   Reply With Quote
Old 10-07-2014, 03:42 PM   #1016
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

I have fixed a few more bugs when working with dictionaries, printing unknown bytes, correcting the warnings about missing bytes in the index, etc.

So please give this version a try. I will delete the older version above.

Thanks,

KevinH

Last edited by KevinH; 10-07-2014 at 04:22 PM. Reason: remove outdated nlib.zip
KevinH is online now   Reply With Quote
Old 10-07-2014, 04:21 PM   #1017
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

Still more fixes. Sorry but the number of potential code paths is so large and the number of minor changes needed is quite large. I have now tested with and passed all of my test cases and all of my dictionaries.


Hope this one only has a few bugs remaining!

KevinH

Last edited by KevinH; 10-08-2014 at 11:55 AM. Reason: remove outdated nlib.zip attachment
KevinH is online now   Reply With Quote
Old 10-08-2014, 08:50 AM   #1018
tkeo
Connoisseur
tkeo began at the beginning.
 
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
Hi Kevin,

I have modified the code; however, I am not sure that they are proper way to fix.
I attach the patch. (Due to the setting of my IDE many, differences of spaces at the end of lines are included.)

I have not tested ehough; but I see a few bugs still.

1. krzyzacy-tom-pierwszy.mobi (some one posted here)

got an error with python 3:

Spoiler:
Unpacking Book...
Palm DB type: BOOKMOBI, 767 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Krzyナシacy, tom pierwszy
EXTH Title: Krzyナシacy, tom pierwszy
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: cover00346.jpeg from section 346
Extracting image: image00347.jpeg from section 347
Extracting image: image00348.jpeg from section 348
Extracting image: image00349.jpeg from section 349
Extracting font: font00350
Extracting font: font00351
Extracting font: font00352
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Krzyナシacy, tom pierwszy
EXTH Title: Krzyナシacy, tom pierwszy
Palmdoc compression
Unpacking images, resources, fonts, etc
Unpacking raw markup language
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Building a cover page.
Building an opf for mobi8 using epub version: 2
Write K8 ncx
Creating an epub-like file
Traceback (most recent call last):
File "kindleunpack.py", line 1022, in <module>
sys.exit(main())
File "kindleunpack.py", line 1010, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 922, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 839, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 597, in processMobi8
files.makeEPUB(usedmap, obfuscate_data, uuid)
File "D:unpack_structure.py", line 128
, in makeEPUB
data = mangle_fonts(key, data)
File "mobi_utils.py", line 173, in m
angle_fonts
encrypt = b''.join([bchr(bord(x)^next(key)) for x in crypt])
File "mobi_utils.py", line 173, in <
listcomp>
encrypt = b''.join([bchr(bord(x)^next(key)) for x in crypt])
TypeError: ord() expected string of length 1, but int found


2. HDimage_test.mobi

With python 3, content.opf is not properly constructed.
I attach the diff of content.opf v0.75 vs v0.80 on python 3.
EDIT
It is correctly constructed after reversed mobi_opf.py.
I remove the diff.

3. test1.azw3

With python3, the cover page is inserted incorrectly.

Spoiler:
<item id="inserted" media-type="application/xhtml+xml" href="Text/cover_page.xhtml" />
<item id="x_p-cover" media-type="application/xhtml+xml" href="Text/part0000.xhtml" />


Take care,

EDIT
I have replaced the patch file.
Attached Files
File Type: zip nlib.patch.zip (4.5 KB, 153 views)

Last edited by tkeo; 10-08-2014 at 09:41 AM. Reason: replaced the attached file
tkeo is offline   Reply With Quote
Old 10-08-2014, 10:50 AM   #1019
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,


Issue 1 fixed - it was using % to fold in data into bytes which doesn't work in python 3.4 and earlier - but will in 3.5. I made a few change in mobi_html.py to hopefully fix other cases like this.

Issue 3 fixed - it happened because in kindleunpack.py in the cover image code part.find was searching a bytes string for a unicode cover image name and could never find it! I have now fixed that as well.


But I could not see any issue with HDimage_test.mobi at all. I ran it with the 0.75 code, then diffed it against the new nlib code for both python 2 and python 3 and the only differences I could see were due to the order of xml attributes which is random when extracted from dicts. Exactly what problem are you having?

Is this what your patch was for? If so, can you re-post your patch. I can not unzip it on a Mac or my Linux box at all.

Here is what it says when I try. It seems to need some sort of later PK modified Zip routine:


KevinsiMac:Desktop kbhend$ unzip nlib.patch.zip
Archive: nlib.patch.zip
skipping: nlib.patch.txt need PK compat. v6.3 (can do v2.1)


Thanks,

KevinH

ps. I have attached the very latest version of nlib.zip

Last edited by KevinH; 10-08-2014 at 06:43 PM. Reason: remove outdated attachment
KevinH is online now   Reply With Quote
Old 10-08-2014, 12:56 PM   #1020
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,650
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,

On sourceforge I found a p7zip command line program and built it on Mac OS X and then was able to unpack your nlib.patch.zip.

Many of the changes you had were similar to what I had, and others fixed real porting bugs. But there are a few things I can not integrate from your patch:

1. in compatibility_utils.py, under Windows, you can not use the following code, as it will break almost on Windows machines that do not use the cp you use.

+
+ # Conversion of argv to unicode without using cdll.kernel32.GetCommandLineW
+ FILE_SYSTEM_ENCODING = sys.getfilesystemencoding()
+ uargv = []
+ in_double_quotations = False
+ for arg in sys.argv:
+ if not isinstance(arg, unicode):
+ arg = arg.decode(FILE_SYSTEM_ENCODING)
+ if in_double_quotations:
+ if arg[-1] == u'"':
+ in_double_quotations = False
+ arg = arg[:-1]
+ uargv[-1] = uargv[-1] + u' ' + arg
+ else:
+ if arg[0] == u'"':
+ arg = arg[1:]
+ if arg[-1] == u'"':
+ arg = arg[:-1]
+ else:
+ in_double_quotations = True
+ uargv.append(arg)
+ return uargv
+


It is much better to use the kernel32 call, and kindleunpack.py has always used this routine (see utf8_utils.py for utf8_argv) in one form or another. It is needed for full unicode (at least utf-16) compliance on all Windows machines under python 2.



2. The code below should only be used as a last resort.

+if sys.version_info[0] == 2:
+ reload(sys)
+ sys.setdefaultencoding('utf-8')

It hides the automatic upconversions that happen when mixing unicode and bytestrings. Those are exactly the things we need to find and track down and fix. These often happen when printing. The early change I made should have taken care of that. If not, I need a traceback. Luckily python 3 always barfs when trying to mix bytes and unicode so we should be able to find and fix those. We want stdout to be utf-8 encoded so it works on all machines and not just on one specific windows cp. Please send me the traceback of the problem you meant this to fix.

3. And finally:

@@ -333,7 +333,7 @@
flowpart = flows[num]
if fmt == b'inline':
tag = flowpart
- else:
+ elif pdir is not None and fnm is not None:
replacement = b'"../' + utf8_str(pdir) + b'/' + utf8_str(fnm) + b'"'
tag = flow_pattern.sub(replacement, tag, 1)
self.used[fnm] = 'used'

Why are these extra conditions needed? Are then needed in kindleunpack v075 as well? If not, this code is probably just hiding some other underlying problem we need to track down.

Thanks,

KevinH

ps. Is there any way you can get you IDE to stop caring about extra whitespace at the end of lines? Either way I'll run everything through reindent.py to clean up any extra whitspace in the source.

Last edited by KevinH; 10-08-2014 at 06:23 PM.
KevinH is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can i rotate text and insert images in Mobi and EPUB? JanGLi Kindle Formats 5 02-02-2013 04:16 PM
PDF to Mobi with text and images pocketsprocket Kindle Formats 7 05-21-2012 07:06 AM
Mobi files - images DWC Introduce Yourself 5 07-06-2011 01:43 AM
pdf to mobi... creating images rather than text Dumhed Calibre 5 11-06-2010 12:08 PM
Transfer of images on text files anirudh215 PDF 2 06-22-2009 09:28 AM


All times are GMT -4. The time now is 10:50 PM.


MobileRead.com is a privately owned, operated and funded community.