Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 08-12-2019, 05:59 PM   #1
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
KRDS - A parser for Kindle reader data store files

A recent discussion prompted me to look into how annotations are stored on Kindle devices running recent firmware versions.

Information related to each book being read is saved in a pair of sidecar files in the book's .sdr folder. These files contain serialized data objects used by the e-book reader application. The first file contains objects that change with every page turn such as the last page read and reading timing. The second file contains less frequently changed data such as personal annotations, font & dictionary choices, and synced reading position.

The file extensions used depend on the book format:
  • KF8 (.azw3) format: .azw3f and .azw3r
  • KFX format: .yjf and .yjr
  • MOBI (.azw) format: .mbs and .mbp1
  • PDF format: .pdt and .pds
  • Topaz (.azw1) format: .tas and .tal

The data format appears to be proprietary to Amazon and is similar to the Amazon Ion Binary Encoding used by KFX. It encodes the name of each object being serialized along with a list of property values. Values each have an associated data type, such as integer or string. Decoding objects requires knowledge of the data structure associated with each class.


KRDS (Kindle Reader Data Store)

I have written a Python script to parse these files. The main function accepts an input file name, parses it into a Python data structure, and outputs the result as a human readable JSON file.

I reverse engineered the data structures for several classes commonly used by the Kindle reader, but it is likely that I missed some things. Reports of any file that is not handled properly are welcome.

For the latest version see this post by tomsem.


Usage
Spoiler:

Download and unzip the attachment to this post to obtain "krds.py". It should run under recent versions of Python 2 or 3.

Code:
usage: python krds.py [-h] pathname

Convert Kindle reader data store files to JSON

positional arguments:
  pathname    Pathname to be processed (.azw3f, .azw3r, .mbp1, .mbs, .yjf, .yjr)

optional arguments:
  -h, --help  show this help message and exit
Enclose the name of the file to be converted in double quotes if it contains spaces.
The output file will have the same name with ".json" appended.



Sample Output
Spoiler:

Decoded .yjr file:
Code:
{
    "font.prefs": {
        "typeface": "_INVALID_,und:bookerly",
        "lineSp": 1,
        "size": 5,
        "align": 1,
        "insetTop": -1,
        "insetLeft": -1,
        "insetBottom": -1,
        "insetRight": -1,
        "unknown1": -1,
        "bold": 1,
        "userSideloadableFont": "",
        "customFontIndex": -1,
        "mobi7SystemFont": "",
        "mobi7RestoreFont": false,
        "readingPresetSelected": ""
    },
    "sync_lpr": true,
    "annotation.cache.object": {
        "annotation.personal.highlight": [
            {
                "startPosition": "ATwDAAAAAAAA:3803",
                "endPosition": "ATwDAAADAQAA:4062",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AS0DAAAAAAAA:1696",
                "endPosition": "AS0DAADoAAAA:1928",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AWsDAAAAAAAA:12846",
                "endPosition": "AW0DAAB7AQAA:13491",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "ATUDAAAAAAAA:1975",
                "endPosition": "ATsDAAAtAgAA:3802",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AUQDAAAAAAAA:5510",
                "endPosition": "AUgDAAADAQAA:6194",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "ASsDAAAAAAAA:1477",
                "endPosition": "ASsDAABOAAAA:1555",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AW8DAAAAAAAA:13552",
                "endPosition": "ASIEAABwAAAA:42227",
                "creationTime": "2019-08-11T15:24:03.030000",
                "lastModificationTime": "2019-08-11T15:24:03.030000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AWkDAAAAAAAA:12350",
                "endPosition": "AWkDAADvAAAA:12589",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AT8DAAAAAAAA:4154",
                "endPosition": "AUADAAAxAQAA:4745",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            }
        ],
        "annotation.personal.note": [
            {
                "startPosition": "AUADAAAxAQAA:4745",
                "endPosition": "AUADAAAxAQAA:4745",
                "creationTime": "2019-08-11T15:24:03.083000",
                "lastModificationTime": "2019-08-11T15:24:03.083000",
                "template": "0\ufffc0",
                "note": "Here is another note for the book"
            },
            {
                "startPosition": "ATwDAAADAQAA:4062",
                "endPosition": "ATwDAAADAQAA:4062",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0",
                "note": "This is my first  note in this book"
            },
            {
                "startPosition": "AWwDAACcAAAA:13111",
                "endPosition": "AWwDAACcAAAA:13111",
                "creationTime": "2019-08-11T15:24:03.079000",
                "lastModificationTime": "2019-08-11T15:24:03.079000",
                "template": "0\ufffc0",
                "note": "More notes"
            },
            {
                "startPosition": "ASIEAABwAAAA:42227",
                "endPosition": "ASIEAABwAAAA:42227",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0",
                "note": "A really long highlight"
            }
        ],
        "annotation.personal.bookmark": [
            {
                "startPosition": "AVoDAAAAAAAA:9430",
                "endPosition": "AVoDAAAAAAAA:9430",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "AUsDAAAAAAAA:6642",
                "endPosition": "AUsDAAAAAAAA:6642",
                "creationTime": "2019-08-11T15:24:03.088000",
                "lastModificationTime": "2019-08-11T15:24:03.088000",
                "template": "0\ufffc0"
            }
        ]
    },
    "ReaderMetrics": {
        "booklaunchedbefore": "true"
    },
    "erl": "AcgiAAA0AAAA:1206501"
}

Last edited by jhowell; 06-10-2023 at 07:04 AM. Reason: Link to new version by tomsem
jhowell is online now   Reply With Quote
Old 08-12-2019, 06:31 PM   #2
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Wow! Thanks!

I'll try kicking the tires when I get some time.
j.p.s is offline   Reply With Quote
Old 08-12-2019, 07:54 PM   #3
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Quote:
Originally Posted by j.p.s View Post
I'll try kicking the tires when I get some time.
I hope that what I found out can be useful in your project.
jhowell is online now   Reply With Quote
Old 08-13-2019, 03:28 PM   #4
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Quote:
Originally Posted by jhowell View Post
...
the file extensions are .mbs and .mbp1 (for MOBI), .azw3f and .azw3r (for KF8), and .yjf and .yjr (for KFX)
...
Thanks for sharing. Also found that .pds and .pdt (for PDF) have that signature and decode similarly.


Spoiler:
.pds
Code:
{
    "font.prefs": {
        "typeface": "_INVALID_,und:bookerly",
        "lineSp": -1,
        "size": -1,
        "align": -1,
        "insetTop": 28,
        "insetLeft": 28,
        "insetBottom": 0,
        "insetRight": 28,
        "unknown1": -1,
        "bold": -1,
        "userSideloadableFont": "",
        "customFontIndex": -1,
        "mobi7SystemFont": "_INVALID_,und:bookerly",
        "mobi7RestoreFont": false,
        "readingPresetSelected": ""
    },
    "sync_lpr": false,
    "annotation.cache.object": {
        "annotation.personal.highlight": [
            {
                "startPosition": "1 6 51 1 255 177 53 17",
                "endPosition": "1 48 280 1 305 345 79 21",
                "creationTime": "2019-08-13T16:05:05.820000",
                "lastModificationTime": "2019-08-13T16:05:05.820000",
                "template": "0\ufffc0"
            },
            {
                "startPosition": "1 67 361 1 123 445 63 17",
                "endPosition": "1 108 9 1 450 1049 13 21",
                "creationTime": "2019-08-13T16:06:03.643000",
                "lastModificationTime": "2019-08-13T16:06:03.643000",
                "template": "0\ufffc0"
            }
        ],
        "annotation.personal.note": [
            {
                "startPosition": "1 48 280 1 305 345 79 21",
                "endPosition": "1 48 280 1 305 345 79 21",
                "creationTime": "2019-08-13T16:05:46.309000",
                "lastModificationTime": "2019-08-13T16:05:46.309000",
                "template": "0\ufffc0",
                "note": "Ingr\u00e9dients"
            },
            {
                "startPosition": "1 108 9 1 450 1049 13 21",
                "endPosition": "1 108 9 1 450 1049 13 21",
                "creationTime": "2019-08-13T16:06:23.148000",
                "lastModificationTime": "2019-08-13T16:06:23.148000",
                "template": "0\ufffc0",
                "note": "Recette"
            }
        ],
        "annotation.personal.bookmark": [
            {
                "startPosition": "1 0 0 0",
                "endPosition": "1 0 0 0",
                "creationTime": "2019-08-13T16:01:53.130000",
                "lastModificationTime": "2019-08-13T16:01:53.130000",
                "template": "0\ufffc0"
            }
        ]
    },
    "language.store": {
        "language": "fr",
        "unknown1": 0
    },
    "ReaderMetrics": {
        "booklaunchedbefore": "true"
    }
}
.pdt
Code:
{
    "fpr": {
        "position": "80 0 0 0",
        "time": null,
        "timeZoneOffset": null,
        "country": "",
        "device": ""
    },
    "page.history.store": [],
    "lpr": {
        "position": "1 0 0 0",
        "time": "2019-08-13T16:06:55.068000"
    }
}

Last edited by PoP; 08-13-2019 at 04:17 PM. Reason: add generated json
PoP is offline   Reply With Quote
Old 08-13-2019, 07:04 PM   #5
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Quote:
Originally Posted by PoP View Post
Thanks for sharing. Also found that .pds and .pdt (for PDF) have that signature and decode similarly.
Thanks for the info.

I tested a Topaz (.azw1) file and it uses .tal and .tas files with the same type of content.

I will update the first post to add this information.
jhowell is online now   Reply With Quote
Old 08-14-2019, 07:36 AM   #6
shamanNS
Guru
shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.
 
Posts: 885
Karma: 10113994
Join Date: Feb 2010
Location: Serbia
Device: Kindle PW5 [bricked], Kindle PW1
So, this script does not extract the actual text that was highlighted?
shamanNS is offline   Reply With Quote
Old 08-14-2019, 08:09 AM   #7
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Quote:
Originally Posted by shamanNS View Post
So, this script does not extract the actual text that was highlighted?
That is correct. The script decodes whatever is in the files indicated in the first post of this thread. The reader application has no need to store the actual text separately from the book format file.

The linkage between the files that this program decodes and the book's content are fields labeled with "position" in the name. These are strings that identify where to find content within a book and are interpreted differently for each book format.

KF8 (azw3) format appears to be the simplest case. The position is a decimal number giving an offset within the raw HTML content of the book, as can be obtained using the kindleunpack software. See the work done by j.p.s for an example of how to make use of this information.

MOBI (azw) format is similar, but there appears to be additional information that I have not attempted to decode.

KFX uses two values separated by a colon. The first is a base64 encoding of the eid and offset, which are fields used internally by KFX to determine the location of content. The second is the actual position number, which in the case of KFX counts visible unicode characters instead of raw HTML bytes.

I have not looked into how position numbers are handled in the other formats that Kindle supports.

Last edited by jhowell; 08-14-2019 at 08:16 AM.
jhowell is online now   Reply With Quote
Old 11-28-2019, 01:00 AM   #8
paulat
Junior Member
paulat began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Dec 2015
Device: Kindle Voyage
Hi could you please make a python version of the script that extract the actual text that was highlighted? j.p.s's code is in c which is really difficult for me (and most people without sufficent programming knowledges I guess) to use. Thanks.
paulat is offline   Reply With Quote
Old 11-28-2019, 06:52 AM   #9
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by paulat View Post
Hi could you please make a python version of the script that extract the actual text that was highlighted? j.p.s's code is in c which is really difficult for me (and most people without sufficent programming knowledges I guess) to use. Thanks.
Have you tried the perl version azw3r.pl? That doesn't require compiling and also somewhat obsoletes the C version since it is more robust.
j.p.s is offline   Reply With Quote
Old 11-28-2019, 10:49 AM   #10
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Quote:
Originally Posted by paulat View Post
Hi could you please make a python version of the script that extract the actual text that was highlighted?
I do not plan on taking this any further. Hopefully the perl script developed by j.p.s will work for you.
jhowell is online now   Reply With Quote
Old 12-03-2019, 09:00 AM   #11
paulat
Junior Member
paulat began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Dec 2015
Device: Kindle Voyage
Quote:
Originally Posted by j.p.s View Post
Have you tried the perl version azw3r.pl? That doesn't require compiling and also somewhat obsoletes the C version since it is more robust.
Hi thanks I have tried it however the result is not quite what's in my mind. I did somehow work out a solution on my own though.

If anyone else has the same needs: I used johwell's script to get the highlight json file, then used it as an index to extract highlight text from the "My Clippings.txt" file by matching the time of creation of the highlight (there seems to be 1-2 seconds offset in some cases). Since "My Clippings.txt" is guaranteed a superset of the azw3r file, it worked out perfectly! (The original reason that I don't use "My Clippings.txt" directly is that it contains redundant highlights, which for example were deleted from the book.)

Last edited by paulat; 12-03-2019 at 09:03 AM.
paulat is offline   Reply With Quote
Old 12-03-2019, 06:33 PM   #12
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
Quote:
Originally Posted by paulat View Post
Hi thanks I have tried it however the result is not quite what's in my mind. I did somehow work out a solution on my own though.
I’m glad you came up with a solution that meets your needs.
jhowell is online now   Reply With Quote
Old 12-03-2019, 08:11 PM   #13
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by paulat View Post
Hi could you please make a python version of the script that extract the actual text that was highlighted? j.p.s's code is in c which is really difficult for me (and most people without sufficent programming knowledges I guess) to use. Thanks.
Quote:
Originally Posted by j.p.s View Post
Have you tried the perl version azw3r.pl? That doesn't require compiling and also somewhat obsoletes the C version since it is more robust.
Quote:
Originally Posted by paulat View Post
Hi thanks I have tried it however the result is not quite what's in my mind. I did somehow work out a solution on my own though.

If anyone else has the same needs: I used johwell's script to get the highlight json file, then used it as an index to extract highlight text from the "My Clippings.txt" file by matching the time of creation of the highlight (there seems to be 1-2 seconds offset in some cases). Since "My Clippings.txt" is guaranteed a superset of the azw3r file, it worked out perfectly! (The original reason that I don't use "My Clippings.txt" directly is that it contains redundant highlights, which for example were deleted from the book.)
I'm glad you found something that works for you, but you should know that "My Clippings.txt" can cause trouble when it gets large. You might want to copy it to your pc from time to time and delete it on your kindle.

I also haven't used my highlight tools for a while and had forgotten about krdsJSON2notes.pl which uses the output of jhowell's krds.py to extract highlights from the book text. krdsJSON2notes.pl is in azw3r-0.17.zip is attached to post #1 in my highlight extraction thread.
j.p.s is offline   Reply With Quote
Old 09-22-2020, 08:15 AM   #14
bopuc
livin' with ebooks
bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.bopuc is an enigma wrapped up in a mystery.
 
bopuc's Avatar
 
Posts: 29
Karma: 41344
Join Date: Jun 2014
Location: Berlin (DE), winters in Aomori (JP), from Montréal (CA)
Device: Libra2, KOA
Quote:
Originally Posted by paulat View Post
Hi thanks I have tried it however the result is not quite what's in my mind. I did somehow work out a solution on my own though.

If anyone else has the same needs: I used johwell's script to get the highlight json file, then used it as an index to extract highlight text from the "My Clippings.txt" file by matching the time of creation of the highlight (there seems to be 1-2 seconds offset in some cases). Since "My Clippings.txt" is guaranteed a superset of the azw3r file, it worked out perfectly! (The original reason that I don't use "My Clippings.txt" directly is that it contains redundant highlights, which for example were deleted from the book.)
Could you um share your code please? Others may find it useful too!
bopuc is offline   Reply With Quote
Old 07-29-2021, 06:20 AM   #15
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
Very interesting functionality. I am doing a bit of research on the data that comes out and the most useful thing I have achieved is the following that I explain with an example:

Code:
{
    "timer.model": {
        "version": 3,
        "totalTime": 35191528,
        "totalWords": 131355,
        "totalPercent": 1.0758328462887201,
        "averageCalculator": {
            "samples1": [],
            "samples2": [],
            "normalDistributions": [
                {
                    "count": 466,
                    "sum": 102178.8613209594,
                    "sumOfSquares": 23794574.149871096
                },
                {
                    "count": 17,
                    "sum": 9752.422438256466,
                    "sumOfSquares": 5781216.939200989
                }
            ],
            "outliers": [
                [
                    52.47694415740492
                ],
                [
                    833.0194263574235
                ],
                [
                    389.67816758480717,
                    395.84946569614374,
                    416.29497472494796
                ]
            ]
        }
    },
    "fpr": {
        "position": "997716",
        "time": null,
        "timeZoneOffset": null,
        "country": "",
        "device": ""
    },
    "book.info.store": {
        "numberOfWords": 167321,
        "percentOfBook": 1.3649912331969611
    },
    "page.history.store": [
        {
            "position": "77096",
            "time": "2021-04-18T23:56:15.675000"
        },
        {
            "position": "169022",
            "time": "2021-04-21T17:47:22.352000"
        },
        {
            "position": "836132",
            "time": "2021-05-04T00:19:54.125000"
        }
    ],
    "lpr": {
        "position": "997716",
        "time": "2021-05-07T21:49:36.114000"
    },
    "whisperstore.migration.status": [
        false,
        false
    ]
}

totalTime is the total read time in milliseconds.
Once approximately fifteen samples have been made (each sample is taken when the page is turned), statistics come out. Under normalDistributions comes out count which refers to the samples taken. sum refers to the sum of words per minute. With which if we divide sum / count we get the average reading in words per minute. In the example, 102 178/466 = 219
outliers are the samples that deviate from the mean. This usually happens at the beginning of the book when we skip a lot of useless words before the actual beginning of the book.

Of the various normalDistributions, the first is composed of normal page reads in which after a few tests, values of pages are entered which have been read at a speed of between 0 and 800 words per minute approximately. The second consists of readings with more than 800 words per minute. This happens when we go back and forth in the book just to look at something that we may have forgotten. This is not a normal reading and gives some anomalous data of words per minute with which they go to the other collection so that they do not distort the real data.

If the book is left open without locking, it counts the reading speed until it locks. Therefore, to make the calculation better, when we are not reading, it is best to lock the device. If we are sleepy and fall asleep, too bad ... that counts as slow reading (and it's true, ha ha ha)

Last edited by Shark69; 07-29-2021 at 01:47 PM.
Shark69 is offline   Reply With Quote
Reply

Tags
krds


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Where does Reader for PC store files? redbees Sony Reader 13 01-22-2021 09:07 AM
Lost Kindle Data files Alastair_Lack Amazon Kindle 5 06-28-2013 03:31 PM
Where does the Android Kindle app store ebook files? bfollowell Amazon Kindle 3 03-12-2013 05:04 PM
Where does Connect Reader store ebook files 1490peter Sony Reader 1 08-12-2009 03:45 AM
Old RSS data on Sony Store JeffASonyReader Sony Reader 3 01-14-2008 06:20 PM


All times are GMT -4. The time now is 08:37 PM.


MobileRead.com is a privately owned, operated and funded community.