View Single Post
Old 08-12-2019, 05:59 PM   #1
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,094
Karma: 91592869
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
KRDS - A parser for Kindle reader data store files

A recent discussion prompted me to look into how annotations are stored on Kindle devices running recent firmware versions.

Information related to each book being read is saved in a pair of sidecar files in the book's .sdr folder. These files contain serialized data objects used by the e-book reader application. The first file contains objects that change with every page turn such as the last page read and reading timing. The second file contains less frequently changed data such as personal annotations, font & dictionary choices, and synced reading position.

The file extensions used depend on the book format:
  • KF8 (.azw3) format: .azw3f and .azw3r
  • KFX format: .yjf and .yjr
  • MOBI (.azw) format: .mbs and .mbp1
  • PDF format: .pdt and .pds
  • Topaz (.azw1) format: .tas and .tal

The data format appears to be proprietary to Amazon and is similar to the Amazon Ion Binary Encoding used by KFX. It encodes the name of each object being serialized along with a list of property values. Values each have an associated data type, such as integer or string. Decoding objects requires knowledge of the data structure associated with each class.


KRDS (Kindle Reader Data Store)

I have written a Python script to parse these files. The main function accepts an input file name, parses it into a Python data structure, and outputs the result as a human readable JSON file.

I reverse engineered the data structures for several classes commonly used by the Kindle reader, but it is likely that I missed some things. Reports of any file that is not handled properly are welcome.

For the latest version see this post by tomsem.


Usage
Spoiler:

Download and unzip the attachment to this post to obtain "krds.py". It should run under recent versions of Python 2 or 3.

Code:
usage: python krds.py [-h] pathname

Convert Kindle reader data store files to JSON

positional arguments:
  pathname    Pathname to be processed (.azw3f, .azw3r, .mbp1, .mbs, .yjf, .yjr)

optional arguments:
  -h, --help  show this help message and exit
Enclose the name of the file to be converted in double quotes if it contains spaces.
The output file will have the same name with ".json" appended.



Sample Output
Spoiler:

Decoded .yjr file:
Code:
{
    "font.prefs": {
        "typeface": "_INVALID_,und:bookerly",
        "lineSp": 1,
        "size": 5,
        "align": 1,
        "insetTop": -1,
        "insetLeft": -1,
        "insetBottom": -1,
        "insetRight": -1,
        "unknown1": -1,
        "bold": 1,
        "userSideloadableFont": "",
        "customFontIndex": -1,
        "mobi7SystemFont": "",
        "mobi7RestoreFont": false,
        "readingPresetSelected": ""
    },
    "sync_lpr": true,
    "annotation.cache.object": {
        "annotation.personal.highlight": [
            {
                "startPosition": "ATwDAAAAAAAA:3803",
                "endPosition": "ATwDAAADAQAA:4062",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AS0DAAAAAAAA:1696",
                "endPosition": "AS0DAADoAAAA:1928",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AWsDAAAAAAAA:12846",
                "endPosition": "AW0DAAB7AQAA:13491",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "ATUDAAAAAAAA:1975",
                "endPosition": "ATsDAAAtAgAA:3802",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AUQDAAAAAAAA:5510",
                "endPosition": "AUgDAAADAQAA:6194",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "ASsDAAAAAAAA:1477",
                "endPosition": "ASsDAABOAAAA:1555",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AW8DAAAAAAAA:13552",
                "endPosition": "ASIEAABwAAAA:42227",
                "creationTime": "2019-08-11T15:24:03.030000",
                "lastModificationTime": "2019-08-11T15:24:03.030000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AWkDAAAAAAAA:12350",
                "endPosition": "AWkDAADvAAAA:12589",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AT8DAAAAAAAA:4154",
                "endPosition": "AUADAAAxAQAA:4745",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            }
        ],
        "annotation.personal.note": [
            {
                "startPosition": "AUADAAAxAQAA:4745",
                "endPosition": "AUADAAAxAQAA:4745",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0",
                "note": "Here is another note for the book"
            },
            {
                "startPosition": "ATwDAAADAQAA:4062",
                "endPosition": "ATwDAAADAQAA:4062",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0",
                "note": "This is my first  note in this book"
            },
            {
                "startPosition": "AWwDAACcAAAA:13111",
                "endPosition": "AWwDAACcAAAA:13111",
                "creationTime": "2019-08-11T15:24:03.079000",
                "lastModificationTime": "2019-08-11T15:24:03.079000",
                "template": "0\ufffc0",
                "note": "More notes"
            },
            {
                "startPosition": "ASIEAABwAAAA:42227",
                "endPosition": "ASIEAABwAAAA:42227",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0",
                "note": "A really long highlight"
            }
        ],
        "annotation.personal.bookmark": [
            {
                "startPosition": "AVoDAAAAAAAA:9430",
                "endPosition": "AVoDAAAAAAAA:9430",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AUsDAAAAAAAA:6642",
                "endPosition": "AUsDAAAAAAAA:6642",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            }
        ]
    },
    "ReaderMetrics": {
        "booklaunchedbefore": "true"
    },
    "erl": "AcgiAAA0AAAA:1206501"
}

Last edited by jhowell; 06-10-2023 at 07:04 AM. Reason: Link to new version by tomsem
jhowell is offline   Reply With Quote