View Single Post
Old 10-05-2014, 11:32 AM   #1011
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,817
Karma: 6000000
Join Date: Nov 2009
Device: many
Hi tkeo,

Thanks for testing this ....

Quote:
Originally Posted by tkeo View Post
1. HDimage_test.mobi (an epub3 fixed layout ebook which I posted before)

Successfully unpacked with python 2; but with python 3, got an error message:

replacement = b'%s%s%s'%(osep, b'../Images/' + imageName, csep)
TypeError: can't concat bytes to str
This was a combination of problems

- no use of % to fold ascii or utf-8 strings into binary data (there is a pep on this)

- issues with iterating bytes and extracting single bytes from bytestrings, and there is a pep on this as well (pep 467) but nothing definite yet

But I have now fixed this.

Quote:
2. test2.awz3 (an epub2 reflowable ebook in English with several images)
Got errors with the both versions.

with python 2:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXXXXXX
EXTH Title: XXXXXXXX
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00172.jpeg from section 172
Extracting image: image00173.jpeg from section 173
Extracting image: image00174.gif from section 174
Extracting image: image00175.gif from section 175
Extracting image: image00176.jpeg from section 176
Extracting image: image00177.gif from section 177
Extracting image: image00178.gif from section 178
Extracting image: cover00179.jpeg from section 179
Extracting image: image00180.jpeg from section 180
Extracting image: image00181.jpeg from section 181
Extracting image: image00183.jpeg from section 183
Unpacking raw markup language
Error: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(
128)
Traceback (most recent call last):
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 456, in processMobi8
rawML = mh.getRawML()
File "mobi_header.py", line 785, in
getRawML
dataList.append(self.unpack(data))
File "mobi_uncompress.py", line 131,
in unpack
slice = self.unpack(slice)
File "mobi_uncompress.py", line 133,
in unpack
s += slice
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)


with python 3:
Spoiler:

Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 869, in unpackBook
mh = MobiHeader(sect,0)
File "mobi_header.py", line 484, in
__init__
reader.loadCdic(self.sect.loadSection(huffoff+i))
File "mobi_uncompress.py", line 97,
in loadCdic
self.dictionary += lmap(getslice, struct.unpack_from(b'>%dH' % n, cdic, 16))

TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
This is because I have not tried books with huffman cdic compression. I will generate a few test cases and see if I can track this down.

Quote:
3. kokoro.mobi (an epub3 rtl reflowable ebook in Japanese)
Unpacked as an epub2 ebook instead of the epub3 with the both versions.
Probably due to a comparison against a string constant where the variable be tested or the constant itself is bytestring and the variable in unicode or visa versa.


I have fixes for error 1 in the tree and I will track down and fix the huffman/cdic code with my own testcase. I will post an updated version once I have both errors fixed. Please keep trying them on as many test cases as you have so that we can exercise all of the code and track down these last issues.

Thanks,

Kevin
KevinH is online now   Reply With Quote