08-13-2015, 02:12 PM | #16 |
Enthusiast
Posts: 38
Karma: 6042
Join Date: Nov 2011
Device: SONY PRS505
|
I suspect that DRMION has more to with the "Ion" format than to a homegrown encryption algorithm - encryption is almost certainly AES.
The format appears to have been internally "YellowJersey" explain the "yjf" and "yjr" files that are created -- some project leader is apparently a cycling fan? |
08-13-2015, 02:39 PM | #17 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
It depends what you mean by the word "encrypted". It can certainly be encrypted in the sense that it is obfuscated to prevent text being extracted from it. It would appear, however, that it is not locked to a specific device.
|
Advert | |
|
08-13-2015, 06:39 PM | #18 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Yes, just looking at the number of bytes with the high bits set. It appears to have been compressed as well. Near the end of the header info is the string LZMA but the data following it do not match any LZMA start bytes I have ever seen. So some encryption (AES CBC) was probably done after the LZMA compression stage.
but if it is encrypted it must not be tied to the device given the test results. |
08-14-2015, 12:42 AM | #19 | |
just an egg
Posts: 1,586
Karma: 4300000
Join Date: Mar 2015
Device: Kindle, iOS
|
Just in case this is useful to those of you who understand this stuff:
Quote:
|
|
08-14-2015, 01:09 AM | #20 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
base64 is a method of displaying binary data in ASCII format. See our wiki for more details. This is used for images, although less efficient than pure binary. It is similar to what I found in the AZK format used on iOS devices. They use java to parse the eBook.
Dale |
Advert | |
|
08-14-2015, 01:07 PM | #21 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
It appears that the header (see the earlier image) uses variable byte length values (see the getVariableWidthValue in mobi_index.py in KindleUnpack). Each obviously human readable string is preceded by its length (stored as a variable width value). I think to properly reverse this thing we would need to have someone disassemble the new code from the update that adds kfx support. From that we may be able to figure out the meaning of each field, how the exact keys are generated (etc). My 2 cents ... KevinH |
08-14-2015, 03:40 PM | #22 |
Grand Sorcerer
Posts: 6,497
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
Hopefully an expert on kindle firmware will take a look at the software that handles kfx in order to work out more details of how the new format works. I don't have those skills, but I love a mystery and so I decided to see what I could discover just by inspection of the files. I thought that perhaps some of my experience in reading core dumps in a former life could come in handy.
I am making some progress. I tried theorizing on how the files are structured and wrote code to attempt to parse them based on those theories. I iterated through a large number of failures, but now I think that I have a pretty good overall idea of how the files work. So far I have found three innovations in the new format. First, the auxiliary kfx files in the sdr directory have a format similar to azw6. Those files start with a CONT (container?) block which has a table of pointers to ENTY (entry?, entity?) blocks. Each of those represents a resource, such as an image file. Logically it is equivalent to a directory of files, each identified by a number. Second is a new data encoding that can be found in all of the kfx-related files. This seems to be a serialized binary object format, like a binary version of json and similar to message pack. I did some research, but as far as I can tell the object coding being used is new and unique to Amazon. If not perhaps someone might recognize it from my description. Each data block starts with a header of e0 01 00 ea. The format has compact coding for numbers, strings, blobs, integers, lists, objects, etc. It relies on the variable-length integer format used for some older kindle data structures. The main problem with decoding it is that the identification of object and attribute types is by an ID number. In order to make sense of the data you need a mapping of these ids to the corresponding object/attribute name. I have been able to figure out several by inspection, such as book title and author and image height and width, but most are not so obvious. There are hundreds of different ids. Hopefully someone will come across a list of mappings. Third is the apparent use of encryption to hide the content of the primary kfx file, even for non-DRM protected books. The DRMION format appears to be a list of objects (encoded as described previously), each of which has an encrypted blob and a series of parameters that appear to describe the method for decoding it. I haven't had a chance to research this further yet. Exploring kfx is like peeling an onion. Decoding one level of the file structure just opens another that needs to be examined. |
08-14-2015, 03:56 PM | #23 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Well done! That's extremely useful information.
|
08-14-2015, 04:50 PM | #24 |
Grand Sorcerer
Posts: 6,497
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
The kfx binary object encoding makes use of forward encoded variable-width integers as described in the MobileRead Wiki.
Values are represented by a type byte, followed by data. The four most significant bits of the type byte indicate the specific type and the four least significant bits indicate the length of the data that follows. A special value of 14 for the length indicates that a variable-width integer follows the type byte and it indicates the true length of the data. Type values that I have detected are: 2 for integer (stored big-endian), 8 for string, 10 for binary data, 11 for a list (contains other types), 13 for a dictionary (set of id/value pairs), and 14 for an object (id/data pair). Other type values are used, but I haven't found enough examples to be able to determine what they represent. Object and attribute ids use variable-width integers. The mapping to their actual names is not encoded in the data. Last edited by jhowell; 08-14-2015 at 04:51 PM. Reason: typo |
08-14-2015, 10:48 PM | #25 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You might want to look at the list of mapping from numbers to attributes used in MOBI and EXTH headers the mapping in kfx might re-use that. There is a partial list in the mobileread wiki and a more complete list in the calibre source code.
|
08-15-2015, 08:44 AM | #26 |
Wizard
Posts: 3,108
Karma: 60231510
Join Date: Nov 2011
Location: Australia
Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7
|
@jhowell. Thanks for your efforts on this. We are fortunate that you like a challenge. And also fortunate that you are not alone in this.
|
08-19-2015, 03:28 PM | #27 | |
Grand Sorcerer
Posts: 6,497
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
Quote:
The identifiers in use by KFX appear to be grouped into ranges. Values less than 100 relate to the DRM/voucher system used to protect the main book contents. Scattered ranges of values from 150 to about 500 seem to hold book metadata. And higher numbers appear to be dynamically assigned to individual book resources, such as images. Of course I don't know what is inside the main KFX book file due to its encryption. EXTH codes might still be present there, but I doubt it. The bulk of the metadata appears to be placed in the auxiliary KFX files in the SDR directory. |
|
08-19-2015, 10:28 PM | #28 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Too bad -- that would have made life a little easier I suspect that reverse engineering the format is not going to get very far until an unencrypted sample is obtained, or someone gets hold of the kfxgen tool amazon is using internally.
|
09-02-2015, 04:21 AM | #29 |
Groupie
Posts: 184
Karma: 10
Join Date: Jul 2008
Location: Queensland, Australia
Device: Appraising the market
|
Will Calibre be able to include KFX files in the near future ie in Add Books mode?
mitch13 |
09-02-2015, 04:23 AM | #30 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Not until the format is reverse engineered which, as you can see from the thread, has not yet happened.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using Kindle format vs ePub format is like using a compiler vs winzip? | Julius Caesar | Workshop | 1 | 09-01-2013 07:34 PM |
iPhone Convert epub format to kindle for iPhone format. Is it possible? | thecyberphotog | Apple Devices | 16 | 03-14-2013 01:04 AM |
No 'epub' format shown in Plugboards Format dropdown list | kakkalla | Library Management | 3 | 06-16-2012 04:23 AM |
Ebook in PRC format will not convert to any other format | Katelyn | Calibre | 0 | 10-01-2010 07:02 PM |
Master Format for multi-format eBook Generation? | cerement | Workshop | 43 | 04-01-2009 12:00 PM |