Hi Kevin,
I have tested a few ebooks and got errors with the experimental code. Yes, it has (I think a lot

) bugs.
The experimental environment is as follows:
python versions are 2.7.6 and 3.3.4.1 for windows 32bit.
The codepage of the Windows is cp932.
PYTHONIOENCODING=utf-8 is set.
I have got following errors:
1. HDimage_test.mobi (an epub3 fixed layout ebook which I posted before)
Successfully unpacked with python 2; but with python 3, got an error message:
Spoiler:
Unpacking Book...
Palm DB type: BOOKMOBI, 38 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 5 section of book...
Mobi Version: 5
Codec: utf-8
Title: b'HD Content test'
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image00003.jpeg from section 3
Extracting image: image00004.jpeg from section 4
Extracting image: image00005.jpeg from section 5
Extracting image: image00006.jpeg from section 6
Extracting image: image00007.jpeg from section 7
Extracting image: cover00008.jpeg from section 8
Extracting image: image00010.jpeg from section 10
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: b'HD Content test'
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting HD image: HDimage00029.jpeg from section 29
Extracting HD image: HDimage00030.jpeg from section 30
Extracting HD image: HDimage00031.jpeg from section 31
Extracting HD image: HDimage00032.jpeg from section 32
Extracting HD image: HDimage00034.jpeg from section 34
Unpacking raw markup language
Warning: There are unprocessed index bytes left: b'0000'
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 523, in processMobi8
usedmap = htmlproc.buildXHTML()
File "mobi_html.py", line 367, in bu
ildXHTML
replacement = b'%s%s%s'%(osep, b'../Images/' + imageName, csep)
TypeError: can't concat bytes to str
2. test2.awz3 (an epub2 reflowable ebook in English with several images)
Got errors with the both versions.
with python 2:
Spoiler:
Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000
000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: XXXXXXXX
EXTH Title: XXXXXXXX
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00172.jpeg from section 172
Extracting image: image00173.jpeg from section 173
Extracting image: image00174.gif from section 174
Extracting image: image00175.gif from section 175
Extracting image: image00176.jpeg from section 176
Extracting image: image00177.gif from section 177
Extracting image: image00178.gif from section 178
Extracting image: cover00179.jpeg from section 179
Extracting image: image00180.jpeg from section 180
Extracting image: image00181.jpeg from section 181
Extracting image: image00183.jpeg from section 183
Unpacking raw markup language
Error: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(
128)
Traceback (most recent call last):
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 910, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, ep
ubver, use_hd)
File "kindleunpack.py", line 827, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, imgnames, pagemapproc, k8resc, obfus
cate_data, apnxfile, epubver)
File "kindleunpack.py", line 456, in processMobi8
rawML = mh.getRawML()
File "mobi_header.py", line 785, in
getRawML
dataList.append(self.unpack(data))
File "mobi_uncompress.py", line 131,
in unpack
slice = self.unpack(slice)
File "mobi_uncompress.py", line 133,
in unpack
s += slice
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)
with python 3:
Spoiler:
Unpacking Book...
Palm DB type: BOOKMOBI, 190 sections.
Traceback (most recent call last):
File "kindleunpack.py", line 1008, in <module>
sys.exit(main())
File "kindleunpack.py", line 996, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 869, in unpackBook
mh = MobiHeader(sect,0)
File "mobi_header.py", line 484, in
__init__
reader.loadCdic(self.sect.loadSection(huffoff+i))
File "mobi_uncompress.py", line 97,
in loadCdic
self.dictionary += lmap(getslice, struct.unpack_from(b'>%dH' % n, cdic, 16))
TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
3. kokoro.mobi (an epub3 rtl reflowable ebook in Japanese)
Unpacked as an epub2 ebook instead of the epub3 with the both versions.
I will see the code and debug if possible after tomorrow.
Take care,