View Single Post
Old 08-14-2019, 08:09 AM   #7
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,099
Karma: 92190113
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
Quote:
Originally Posted by shamanNS View Post
So, this script does not extract the actual text that was highlighted?
That is correct. The script decodes whatever is in the files indicated in the first post of this thread. The reader application has no need to store the actual text separately from the book format file.

The linkage between the files that this program decodes and the book's content are fields labeled with "position" in the name. These are strings that identify where to find content within a book and are interpreted differently for each book format.

KF8 (azw3) format appears to be the simplest case. The position is a decimal number giving an offset within the raw HTML content of the book, as can be obtained using the kindleunpack software. See the work done by j.p.s for an example of how to make use of this information.

MOBI (azw) format is similar, but there appears to be additional information that I have not attempted to decode.

KFX uses two values separated by a colon. The first is a base64 encoding of the eid and offset, which are fields used internally by KFX to determine the location of content. The second is the actual position number, which in the case of KFX counts visible unicode characters instead of raw HTML bytes.

I have not looked into how position numbers are handled in the other formats that Kindle supports.

Last edited by jhowell; 08-14-2019 at 08:16 AM.
jhowell is offline   Reply With Quote