KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files - Page 80

elchamaco · 09-25-2015, 03:27 PM

Hi, I've one question related to python in fact but to kindleunpack in some way.

I want to use the azw3 save file feature to strip the azw3 from a kindlegen created combi mobi. I modified the kindleunpack.py code a bit and it works really fast, now i get an azw3 file in the output dir (i want to save only the azw3 not the rest of the unpack work).

Well my problem is when i call from msdos kindleunpack.py from a directory it's unable to find the import modules, only when i'm located in the lib directory works. I'm very bad with python, is there anyway to import modules without adding them to the path in windows and calling the script from other directory?

Example

C:\kindlegen\kindleunpack\lib\kindleunpack.py

If my directory is c:\kindlegen\
And i use
python kindleunpack\lib\kindleunpack.py

the result is
ImportError: No module named compatibility_utils

If the directory is C:\kindlegen\kindleunpack\lib\ works find python finds the rest of the modules.

Thanks.

PS: I'm trying to do a batch to convert books with kindlegen but without the extrasize from the old mobi format.
In fact it would be great if a calibre plugin conversion coul be done using kindlegen, striping the azw3, modifying parameters to see it as a normal document. But i don't know if it's possible to bridge calibre azw3 conversion and i'm really bad with python, so calibre plugin is perhaps betond my capabilities so i'll try the easy way with a batch in msdos.

atonement · 12-22-2015, 08:09 AM

Is it possible to get the standalone version running on Android? Python is available for Android in the form of QPython.

KevinH · 12-22-2015, 04:01 PM

Since I don't own anything running android, I doubt it very much. Have you tried simply moving the python code over and trying?

atonement · 12-23-2015, 06:04 AM

Quote:

Originally Posted by KevinH

Since I don't own anything running android, I doubt it very much. Have you tried simply moving the python code over and trying?

Yes but I am pretty much hopeless at this. Will Android emulators like Bluestacks ( also available for Mac ) be of any use?

elmimmo · 12-23-2015, 06:51 AM

I have a mobi which KindleUnpack v0.80 is not able to unpack. I tried too with the latest version at GitHub. This is what happens:

Code:

$ ./kindleunpack.py book.mobi 
KindleUnpack v0.80
   Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <root@ihack.net>
   Extensive Extensions and Improvements Copyright © 2009-2014 
       by:  P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 118 sections.
Error: 'utf8' codec can't decode byte 0xe8 in position 33: invalid continuation byte
Traceback (most recent call last):
  File "./kindleunpack.py", line 1004, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "./kindleunpack.py", line 878, in unpackBook
    mh = MobiHeader(sect,0)
  File "./mobi_header.py", line 524, in __init__
    self.parseMetaData()
  File "./mobi_header.py", line 818, in parseMetaData
    addValue(name, content.decode(codec))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 33: invalid continuation byte

Tried on Mac OS X 10.10.5, Python 2.7.10 (default, Oct 27 2015, 10:27:07) installed via Homebrew.

pdurrant · 12-23-2015, 08:27 AM

Quote:

Originally Posted by elmimmo

I have a mobi which KindleUnpack v0.80 is not able to unpack. I tried too with the latest version at GitHub. This is what happens:

My guess is that the Mobi is wrong about the text encoding of the metadata. I suppose unpack shouldn't crash when that happens.

Doitsu · 12-23-2015, 08:59 AM

Quote:

Originally Posted by elmimmo

I have a mobi which KindleUnpack v0.80 is not able to unpack. I tried too with the latest version at GitHub. This is what happens:

1. Is this by any chance a book in a language that uses a non-Latin alphabet, e.g. Cyrillic, or accented characters/umlauts in the book metadata?

2. Do you have access to the original source files?

KevinH · 12-23-2015, 10:37 AM

Or alternatively since this is a metadata issue, please try running the latest version of DumpMobiHeader on it and posting the results here. It may similarly error out but the output will tell us what encoding the book is supposed to be using, version, etc.
KevinH

kovidgoyal · 12-23-2015, 10:03 PM

@KevinH: This is almost certainly caused by an issue with the trailing bytes at the end of every text record. There were (long ago) versions of the dedrm tool that used to produce de-drmed mobi files with corrupted headers (extra data flag set to zero). In such files you can end up with text that contains partial utf-8 byte sequences.

elmimmo · 12-24-2015, 09:05 AM

The book is in Spanish, so at most it will have things like accented vowels or so. I do not have access to its source.

Quote:

Originally Posted by KevinH

Or alternatively since this is a metadata issue, please try running the latest version of DumpMobiHeader on it and posting the results here.

I was not familiar with DumpMobiHeader. I downloaded the version posted in this thread, and this was its output:

Code:

DumpMobiHeader
book.mobi .MOBI


First Header Dump from Section 0
Header Version is: 0x6
Header start position is: 0x0
Header Length is: 0x100
  Field:     compression_type   Offset: 0x000   Width:  2   Value: 0x02
  Field:                fill0   Offset: 0x002   Width:  2   Value: 0x00
  Field:          text_length   Offset: 0x004   Width:  4   Value: 0xba34
  Field:         text_records   Offset: 0x008   Width:  2   Value: 0x0c
  Field:     max_section_size   Offset: 0x00a   Width:  2   Value: 0x1000
  Field:          crypto_type   Offset: 0x00c   Width:  2   Value: 0x00
  Field:                fill1   Offset: 0x00e   Width:  2   Value: 0x00
  Field:                magic   Offset: 0x010   Width:  4   Value: MOBI
  Field:        header_length   Offset: 0x014   Width:  4   Value: 0x0100
  Field:                 type   Offset: 0x018   Width:  4   Value: 0x0002
  Field:             codepage   Offset: 0x01c   Width:  4   Value: 0xfde9
  Field:            unique_id   Offset: 0x020   Width:  4   Value: 0x5daedfaf
  Field:              version   Offset: 0x024   Width:  4   Value: 0x0006
  Field:        metaorthindex   Offset: 0x028   Width:  4   Value: 0xffffffff
  Field:        metainflindex   Offset: 0x02c   Width:  4   Value: 0xffffffff
  Field:          index_names   Offset: 0x030   Width:  4   Value: 0xffffffff
  Field:           index_keys   Offset: 0x034   Width:  4   Value: 0xffffffff
  Field:         extra_index0   Offset: 0x038   Width:  4   Value: 0xffffffff
  Field:         extra_index1   Offset: 0x03c   Width:  4   Value: 0xffffffff
  Field:         extra_index2   Offset: 0x040   Width:  4   Value: 0xffffffff
  Field:         extra_index3   Offset: 0x044   Width:  4   Value: 0xffffffff
  Field:         extra_index4   Offset: 0x048   Width:  4   Value: 0xffffffff
  Field:         extra_index5   Offset: 0x04c   Width:  4   Value: 0xffffffff
  Field:        first_nontext   Offset: 0x050   Width:  4   Value: 0x000e
  Field:         title_offset   Offset: 0x054   Width:  4   Value: 0x02b4
  Field:         title_length   Offset: 0x058   Width:  4   Value: 0x0010
  Field:        language_code   Offset: 0x05c   Width:  4   Value: 0x040a
  Field:         dict_in_lang   Offset: 0x060   Width:  4   Value: 0x0000
  Field:        dict_out_lang   Offset: 0x064   Width:  4   Value: 0x0000
  Field:          min_version   Offset: 0x068   Width:  4   Value: 0x0006
  Field:    first_addl_offset   Offset: 0x06c   Width:  4   Value: 0x0011
  Field:          huff_offset   Offset: 0x070   Width:  4   Value: 0x0000
  Field:             huff_num   Offset: 0x074   Width:  4   Value: 0x0000
  Field:      huff_tbl_offset   Offset: 0x078   Width:  4   Value: 0x0000
  Field:         huff_tbl_len   Offset: 0x07c   Width:  4   Value: 0x0000
  Field:           exth_flags   Offset: 0x080   Width:  4   Value: 0x1850
  Field:              fill3_a   Offset: 0x084   Width:  4   Value: 0x0000
  Field:              fill3_b   Offset: 0x088   Width:  4   Value: 0x0000
  Field:              fill3_c   Offset: 0x08c   Width:  4   Value: 0x0000
  Field:              fill3_d   Offset: 0x090   Width:  4   Value: 0x0000
  Field:              fill3_e   Offset: 0x094   Width:  4   Value: 0x0000
  Field:              fill3_f   Offset: 0x098   Width:  4   Value: 0x0000
  Field:              fill3_g   Offset: 0x09c   Width:  4   Value: 0x0000
  Field:              fill3_h   Offset: 0x0a0   Width:  4   Value: 0x0000
  Field:           drm_offset   Offset: 0x0a8   Width:  4   Value: 0xffffffff
  Field:            drm_count   Offset: 0x0ac   Width:  4   Value: 0x0000
  Field:             drm_size   Offset: 0x0b0   Width:  4   Value: 0x0000
  Field:            drm_flags   Offset: 0x0b4   Width:  4   Value: 0x0000
  Field:              fill4_a   Offset: 0x0b8   Width:  4   Value: 0x0000
  Field:              fill4_b   Offset: 0x0bc   Width:  4   Value: 0x0000
  Field:        first_content   Offset: 0x0c0   Width:  2   Value: 0x01
  Field:         last_content   Offset: 0x0c2   Width:  2   Value: 0x4c
  Field:             unknown0   Offset: 0x0c4   Width:  4   Value: 0x0001
  Field:          fcis_offset   Offset: 0x0c8   Width:  4   Value: 0x004e
  Field:           fcis_count   Offset: 0x0cc   Width:  4   Value: 0x0001
  Field:          flis_offset   Offset: 0x0d0   Width:  4   Value: 0x004d
  Field:           flis_count   Offset: 0x0d4   Width:  4   Value: 0x0001
  Field:             unknown1   Offset: 0x0d8   Width:  4   Value: 0x0000
  Field:             unknown2   Offset: 0x0dc   Width:  4   Value: 0x0000
  Field:          srcs_offset   Offset: 0x0e0   Width:  4   Value: 0x004f
  Field:           srcs_count   Offset: 0x0e4   Width:  4   Value: 0x0002
  Field:             unknown3   Offset: 0x0e8   Width:  4   Value: 0xffffffff
  Field:             unknown4   Offset: 0x0ec   Width:  4   Value: 0xffffffff
  Field:                fill5   Offset: 0x0f0   Width:  2   Value: 0x00
  Field:      traildata_flags   Offset: 0x0f2   Width:  2   Value: 0x03
  Field:            ncx_index   Offset: 0x0f4   Width:  4   Value: 0x000e
  Field:             unknown5   Offset: 0x0f8   Width:  4   Value: 0xffffffff
  Field:             unknown6   Offset: 0x0fc   Width:  4   Value: 0xffffffff
  Field:          datp_offset   Offset: 0x100   Width:  4   Value: 0xffffffff
  Field:             unknown7   Offset: 0x104   Width:  4   Value: 0xffffffff
Extra Region Length: 0x0
EXTH Region Length:  0x21ac
EXTH MetaData
    Key: "Published"
        Value: "2012-08-2"
Error: 'utf8' codec can't decode byte 0xe8 in position 33: invalid continuation byte

DiapDealer · 12-24-2015, 09:19 AM

Quote:

Originally Posted by elmimmo

I was not familiar with DumpMobiHeader. I downloaded the version posted in this thread, and this was its output:

For the record, DumpMobiHeader is included with KindleUnpack. DumpMobiHeader v019 is in the KindleUnpack v0.80.0 bundle and v020 is the very latest of it on Github.

KevinH · 12-24-2015, 03:54 PM

So there is metadata item that is either binary data that we incorrectly try to interpret as string or improperly encoded string data.

The 0xfde9 value for codepage converts to 65001, which is utf-8.

So someone probably incorrectly edited the metadata in this mobi (possibly trying to hide something for some reason). If you post it privately for me and pm me the link, I should be able to fix it. Have you tried loading it in calibre? Kovid's utf-8 decoding routines are most likely more robust than ours? We could also try modifying kindleunpack to try doing the decoding with replacement or ignoring utf-8 errors.

KevinH

KevinH · 12-24-2015, 04:14 PM

Hi,

Pleaser try again with DumpMobiHeader_v021.py just pushed to my github. And try posting the output here again.

If the problem is improperly encoding metadata, this should work around it and show us where the error might be occurring. We can then see if the bug is in KindleUnpack or in your particular mobi.

Thanks,

KevinH

DiapDealer · 12-30-2015, 12:19 PM

Quote:

Originally Posted by atonement

Is it possible to get the standalone version running on Android? Python is available for Android in the form of QPython.

It's probably possible, but there would likely have to be some pretty serious modifications to make it go. It would have to be QPython, though; seeing as how QPython3's disappointingly behind-the-the-times Python 3.2 is probably insufficient. There would also have to be a script written to gather the various necessary parameters from the user and launch the "real" script.

The environment is probably too different for there to ever be a single code-base that worked for Lin/Win/Mac/QPython. It could probably be forked, though.

elmimmo · 01-05-2016, 10:37 AM

Just to keep everyone posted, the book KindleUnpack could not unpack had an error:

Quote:

Originally Posted by KevinH

The creator's name seems to have been written in some 8-bit encoding (latin-1?) and is not properly utf-8 encoded.

for which KevinH added a workaround in the latest version of KindleUnpack at Github. The book now unpacks successfully (the badly encoded Creator metadata does not survive, but hey!)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can i rotate text and insert images in Mobi and EPUB?	JanGLi	Kindle Formats	5	02-02-2013 04:16 PM
PDF to Mobi with text and images	pocketsprocket	Kindle Formats	7	05-21-2012 07:06 AM
Mobi files - images	DWC	Introduce Yourself	5	07-06-2011 01:43 AM
pdf to mobi... creating images rather than text	Dumhed	Calibre	5	11-06-2010 12:08 PM
Transfer of images on text files	anirudh215	PDF	2	06-22-2009 09:28 AM

09-25-2015, 03:27 PM	#1186
elchamaco Zealot Posts: 128 Karma: 500 Join Date: Aug 2011 Device: kindle, boox	Hi, I've one question related to python in fact but to kindleunpack in some way. I want to use the azw3 save file feature to strip the azw3 from a kindlegen created combi mobi. I modified the kindleunpack.py code a bit and it works really fast, now i get an azw3 file in the output dir (i want to save only the azw3 not the rest of the unpack work). Well my problem is when i call from msdos kindleunpack.py from a directory it's unable to find the import modules, only when i'm located in the lib directory works. I'm very bad with python, is there anyway to import modules without adding them to the path in windows and calling the script from other directory? Example C:\kindlegen\kindleunpack\lib\kindleunpack.py If my directory is c:\kindlegen\ And i use python kindleunpack\lib\kindleunpack.py the result is ImportError: No module named compatibility_utils If the directory is C:\kindlegen\kindleunpack\lib\ works find python finds the rest of the modules. Thanks. PS: I'm trying to do a batch to convert books with kindlegen but without the extrasize from the old mobi format. In fact it would be great if a calibre plugin conversion coul be done using kindlegen, striping the azw3, modifying parameters to see it as a normal document. But i don't know if it's possible to bridge calibre azw3 conversion and i'm really bad with python, so calibre plugin is perhaps betond my capabilities so i'll try the easy way with a batch in msdos.

12-22-2015, 08:09 AM	#1187
atonement Zealot Posts: 107 Karma: 10 Join Date: Feb 2015 Location: India Device: Kindle PW3	Is it possible to get the standalone version running on Android? Python is available for Android in the form of QPython.

12-22-2015, 04:01 PM	#1188
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	Since I don't own anything running android, I doubt it very much. Have you tried simply moving the python code over and trying?

12-23-2015, 10:37 AM	#1193
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	Or alternatively since this is a metadata issue, please try running the latest version of DumpMobiHeader on it and posting the results here. It may similarly error out but the output will tell us what encoding the book is supposed to be using, version, etc. KevinH

12-23-2015, 10:03 PM	#1194
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@KevinH: This is almost certainly caused by an issue with the trailing bytes at the end of every text record. There were (long ago) versions of the dedrm tool that used to produce de-drmed mobi files with corrupted headers (extra data flag set to zero). In such files you can end up with text that contains partial utf-8 byte sequences.

12-24-2015, 03:54 PM	#1197
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	So there is metadata item that is either binary data that we incorrectly try to interpret as string or improperly encoded string data. The 0xfde9 value for codepage converts to 65001, which is utf-8. So someone probably incorrectly edited the metadata in this mobi (possibly trying to hide something for some reason). If you post it privately for me and pm me the link, I should be able to fix it. Have you tried loading it in calibre? Kovid's utf-8 decoding routines are most likely more robust than ours? We could also try modifying kindleunpack to try doing the decoding with replacement or ignoring utf-8 errors. KevinH

12-24-2015, 04:14 PM	#1198
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, Pleaser try again with DumpMobiHeader_v021.py just pushed to my github. And try posting the output here again. If the problem is improperly encoding metadata, this should work around it and show us where the error might be occurring. We can then see if the bug is in KindleUnpack or in your particular mobi. Thanks, KevinH