KFX Format - Page 2

erk · 08-13-2015, 02:12 PM

I suspect that DRMION has more to with the "Ion" format than to a homegrown encryption algorithm - encryption is almost certainly AES.

The format appears to have been internally "YellowJersey" explain the "yjf" and "yjr" files that are created -- some project leader is apparently a cycling fan?

HarryT · 08-13-2015, 02:39 PM

Quote:

Originally Posted by eschwartz

If the book works on two different devices, it cannot be encrypted

It depends what you mean by the word "encrypted". It can certainly be encrypted in the sense that it is obfuscated to prevent text being extracted from it. It would appear, however, that it is not locked to a specific device.

KevinH · 08-13-2015, 06:39 PM

Yes, just looking at the number of bytes with the high bits set. It appears to have been compressed as well. Near the end of the header info is the string LZMA but the data following it do not match any LZMA start bytes I have ever seen. So some encryption (AES CBC) was probably done after the LZMA compression stage.

but if it is encrypted it must not be tied to the device given the test results.

odamizu · 08-14-2015, 12:42 AM

Just in case this is useful to those of you who understand this stuff:

Quote:

Originally Posted by Branch Delay

KFX flags you can set in KFX_CONFIG (root directory, /mnt/us)
These are the defaults.

You can turn on some debugging info that shows ram usage and timing, but nothing useful. Let me know if someone divines anything else useful out of this.

[element]
position_default_format = base64
position_provide_base64 = true
position_provide_short = true
position_provide_long = false
compatible_mode = true

[screen]
ppi = 72
skip_dither = false
use_8bit = false
show_status = false

[debug]
disable_pagination_cache = false
disable_large_section_fallback = false

DaleDe · 08-14-2015, 01:09 AM

base64 is a method of displaying binary data in ASCII format. See our wiki for more details. This is used for images, although less efficient than pure binary. It is similar to what I found in the AZK format used on iOS devices. They use java to parse the eBook.

Dale

KevinH · 08-14-2015, 01:07 PM

Hi,

It appears that the header (see the earlier image) uses variable byte length values (see the getVariableWidthValue in mobi_index.py in KindleUnpack). Each obviously human readable string is preceded by its length (stored as a variable width value).

I think to properly reverse this thing we would need to have someone disassemble the new code from the update that adds kfx support. From that we may be able to figure out the meaning of each field, how the exact keys are generated (etc).

My 2 cents ...

KevinH

jhowell · 08-14-2015, 03:40 PM

Hopefully an expert on kindle firmware will take a look at the software that handles kfx in order to work out more details of how the new format works. I don't have those skills, but I love a mystery and so I decided to see what I could discover just by inspection of the files. I thought that perhaps some of my experience in reading core dumps in a former life could come in handy.

I am making some progress. I tried theorizing on how the files are structured and wrote code to attempt to parse them based on those theories. I iterated through a large number of failures, but now I think that I have a pretty good overall idea of how the files work. So far I have found three innovations in the new format.

First, the auxiliary kfx files in the sdr directory have a format similar to azw6. Those files start with a CONT (container?) block which has a table of pointers to ENTY (entry?, entity?) blocks. Each of those represents a resource, such as an image file. Logically it is equivalent to a directory of files, each identified by a number.

Second is a new data encoding that can be found in all of the kfx-related files. This seems to be a serialized binary object format, like a binary version of json and similar to message pack. I did some research, but as far as I can tell the object coding being used is new and unique to Amazon. If not perhaps someone might recognize it from my description.

Each data block starts with a header of e0 01 00 ea. The format has compact coding for numbers, strings, blobs, integers, lists, objects, etc. It relies on the variable-length integer format used for some older kindle data structures.

The main problem with decoding it is that the identification of object and attribute types is by an ID number. In order to make sense of the data you need a mapping of these ids to the corresponding object/attribute name. I have been able to figure out several by inspection, such as book title and author and image height and width, but most are not so obvious. There are hundreds of different ids. Hopefully someone will come across a list of mappings.

Third is the apparent use of encryption to hide the content of the primary kfx file, even for non-DRM protected books. The DRMION format appears to be a list of objects (encoded as described previously), each of which has an encrypted blob and a series of parameters that appear to describe the method for decoding it. I haven't had a chance to research this further yet.

Exploring kfx is like peeling an onion. Decoding one level of the file structure just opens another that needs to be examined.

HarryT · 08-14-2015, 03:56 PM

Well done! That's extremely useful information.

jhowell · 08-14-2015, 04:50 PM

The kfx binary object encoding makes use of forward encoded variable-width integers as described in the MobileRead Wiki.

Values are represented by a type byte, followed by data. The four most significant bits of the type byte indicate the specific type and the four least significant bits indicate the length of the data that follows. A special value of 14 for the length indicates that a variable-width integer follows the type byte and it indicates the true length of the data.

Type values that I have detected are: 2 for integer (stored big-endian), 8 for string, 10 for binary data, 11 for a list (contains other types), 13 for a dictionary (set of id/value pairs), and 14 for an object (id/data pair). Other type values are used, but I haven't found enough examples to be able to determine what they represent.

Object and attribute ids use variable-width integers. The mapping to their actual names is not encoded in the data.

kovidgoyal · 08-14-2015, 10:48 PM

You might want to look at the list of mapping from numbers to attributes used in MOBI and EXTH headers the mapping in kfx might re-use that. There is a partial list in the mobileread wiki and a more complete list in the calibre source code.

darryl · 08-15-2015, 08:44 AM

@jhowell. Thanks for your efforts on this. We are fortunate that you like a challenge. And also fortunate that you are not alone in this.

jhowell · 08-19-2015, 03:28 PM

Quote:

Originally Posted by kovidgoyal

You might want to look at the list of mapping from numbers to attributes used in MOBI and EXTH headers the mapping in kfx might re-use that. There is a partial list in the mobileread wiki and a more complete list in the calibre source code.

As far as I can tell KFX is not making use of the EXTH codes used by previous kindle formats. I tried to see if the EXTH values made sense as identifiers in the new format, but I haven't come across any overlap in defined EXTH codes and the KFX ids I see in use. I suspect that they deliberately chose the new values to not conflict.

The identifiers in use by KFX appear to be grouped into ranges. Values less than 100 relate to the DRM/voucher system used to protect the main book contents. Scattered ranges of values from 150 to about 500 seem to hold book metadata. And higher numbers appear to be dynamically assigned to individual book resources, such as images.

Of course I don't know what is inside the main KFX book file due to its encryption. EXTH codes might still be present there, but I doubt it. The bulk of the metadata appears to be placed in the auxiliary KFX files in the SDR directory.

kovidgoyal · 08-19-2015, 10:28 PM

Too bad -- that would have made life a little easier

I suspect that reverse engineering the format is not going to get very far until an unencrypted sample is obtained, or someone gets hold of the kfxgen tool amazon is using internally.

mitch13 · 09-02-2015, 04:21 AM

Will Calibre be able to include KFX files in the near future ie in Add Books mode?

mitch13

HarryT · 09-02-2015, 04:23 AM

Not until the format is reverse engineered which, as you can see from the thread, has not yet happened.

08-14-2015, 04:50 PM	#24
jhowell Grand Sorcerer Posts: 6,497 Karma: 84420419 Join Date: Nov 2011 Location: Tampa Bay, Florida Device: Kindles	The kfx binary object encoding makes use of forward encoded variable-width integers as described in the MobileRead Wiki. Values are represented by a type byte, followed by data. The four most significant bits of the type byte indicate the specific type and the four least significant bits indicate the length of the data that follows. A special value of 14 for the length indicates that a variable-width integer follows the type byte and it indicates the true length of the data. Type values that I have detected are: 2 for integer (stored big-endian), 8 for string, 10 for binary data, 11 for a list (contains other types), 13 for a dictionary (set of id/value pairs), and 14 for an object (id/data pair). Other type values are used, but I haven't found enough examples to be able to determine what they represent. Object and attribute ids use variable-width integers. The mapping to their actual names is not encoded in the data. Last edited by jhowell; 08-14-2015 at 04:51 PM. Reason: typo

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Using Kindle format vs ePub format is like using a compiler vs winzip?	Julius Caesar	Workshop	1	09-01-2013 07:34 PM
iPhone Convert epub format to kindle for iPhone format. Is it possible?	thecyberphotog	Apple Devices	16	03-14-2013 01:04 AM
No 'epub' format shown in Plugboards Format dropdown list	kakkalla	Library Management	3	06-16-2012 04:23 AM
Ebook in PRC format will not convert to any other format	Katelyn	Calibre	0	10-01-2010 07:02 PM
Master Format for multi-format eBook Generation?	cerement	Workshop	43	04-01-2009 12:00 PM

08-13-2015, 02:12 PM	#16
erk Enthusiast Posts: 38 Karma: 6042 Join Date: Nov 2011 Device: SONY PRS505	I suspect that DRMION has more to with the "Ion" format than to a homegrown encryption algorithm - encryption is almost certainly AES. The format appears to have been internally "YellowJersey" explain the "yjf" and "yjr" files that are created -- some project leader is apparently a cycling fan?

08-13-2015, 06:39 PM	#18
KevinH Sigil Developer Posts: 7,644 Karma: 5433388 Join Date: Nov 2009 Device: many	Yes, just looking at the number of bytes with the high bits set. It appears to have been compressed as well. Near the end of the header info is the string LZMA but the data following it do not match any LZMA start bytes I have ever seen. So some encryption (AES CBC) was probably done after the LZMA compression stage. but if it is encrypted it must not be tied to the device given the test results.

08-14-2015, 01:09 AM	#20
DaleDe Grand Sorcerer Posts: 11,470 Karma: 13095790 Join Date: Aug 2007 Location: Grass Valley, CA Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7	base64 is a method of displaying binary data in ASCII format. See our wiki for more details. This is used for images, although less efficient than pure binary. It is similar to what I found in the AZK format used on iOS devices. They use java to parse the eBook. Dale

08-14-2015, 01:07 PM	#21
KevinH Sigil Developer Posts: 7,644 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, It appears that the header (see the earlier image) uses variable byte length values (see the getVariableWidthValue in mobi_index.py in KindleUnpack). Each obviously human readable string is preceded by its length (stored as a variable width value). I think to properly reverse this thing we would need to have someone disassemble the new code from the update that adds kfx support. From that we may be able to figure out the meaning of each field, how the exact keys are generated (etc). My 2 cents ... KevinH

08-14-2015, 03:40 PM	#22
jhowell Grand Sorcerer Posts: 6,497 Karma: 84420419 Join Date: Nov 2011 Location: Tampa Bay, Florida Device: Kindles	Hopefully an expert on kindle firmware will take a look at the software that handles kfx in order to work out more details of how the new format works. I don't have those skills, but I love a mystery and so I decided to see what I could discover just by inspection of the files. I thought that perhaps some of my experience in reading core dumps in a former life could come in handy. I am making some progress. I tried theorizing on how the files are structured and wrote code to attempt to parse them based on those theories. I iterated through a large number of failures, but now I think that I have a pretty good overall idea of how the files work. So far I have found three innovations in the new format. First, the auxiliary kfx files in the sdr directory have a format similar to azw6. Those files start with a CONT (container?) block which has a table of pointers to ENTY (entry?, entity?) blocks. Each of those represents a resource, such as an image file. Logically it is equivalent to a directory of files, each identified by a number. Second is a new data encoding that can be found in all of the kfx-related files. This seems to be a serialized binary object format, like a binary version of json and similar to message pack. I did some research, but as far as I can tell the object coding being used is new and unique to Amazon. If not perhaps someone might recognize it from my description. Each data block starts with a header of e0 01 00 ea. The format has compact coding for numbers, strings, blobs, integers, lists, objects, etc. It relies on the variable-length integer format used for some older kindle data structures. The main problem with decoding it is that the identification of object and attribute types is by an ID number. In order to make sense of the data you need a mapping of these ids to the corresponding object/attribute name. I have been able to figure out several by inspection, such as book title and author and image height and width, but most are not so obvious. There are hundreds of different ids. Hopefully someone will come across a list of mappings. Third is the apparent use of encryption to hide the content of the primary kfx file, even for non-DRM protected books. The DRMION format appears to be a list of objects (encoded as described previously), each of which has an encrypted blob and a series of parameters that appear to describe the method for decoding it. I haven't had a chance to research this further yet. Exploring kfx is like peeling an onion. Decoding one level of the file structure just opens another that needs to be examined.

08-14-2015, 03:56 PM	#23
HarryT eBook Enthusiast Posts: 85,544 Karma: 93383043 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6	Well done! That's extremely useful information.

08-14-2015, 10:48 PM	#25
kovidgoyal creator of calibre Posts: 43,859 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You might want to look at the list of mapping from numbers to attributes used in MOBI and EXTH headers the mapping in kfx might re-use that. There is a partial list in the mobileread wiki and a more complete list in the calibre source code.

08-15-2015, 08:44 AM	#26
darryl Wizard Posts: 3,108 Karma: 60231510 Join Date: Nov 2011 Location: Australia Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7	@jhowell. Thanks for your efforts on this. We are fortunate that you like a challenge. And also fortunate that you are not alone in this.

08-19-2015, 10:28 PM	#28
kovidgoyal creator of calibre Posts: 43,859 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Too bad -- that would have made life a little easier I suspect that reverse engineering the format is not going to get very far until an unencrypted sample is obtained, or someone gets hold of the kfxgen tool amazon is using internally.

09-02-2015, 04:21 AM	#29
mitch13 Groupie Posts: 184 Karma: 10 Join Date: Jul 2008 Location: Queensland, Australia Device: Appraising the market	Will Calibre be able to include KFX files in the near future ie in Add Books mode? mitch13

Advert

Advert

09-02-2015, 04:23 AM	#30
HarryT eBook Enthusiast Posts: 85,544 Karma: 93383043 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6	Not until the format is reverse engineered which, as you can see from the thread, has not yet happened.