I've figured out enough of the azw3r format to extract personal highlights, notes, and maybe bookmarks. (All strictly by inspection.) I've also written a C program to extract highlights and notes (in a text format possibly most suitable as an intermediate stage) and a perl script that uses the extracted highlights and notes to mark up the rawml for the book. azw3r.pl is a perl alternative to the C program which takes the same arguments and produces the same output. Both of these can now extract highlighted text from the book's rawml file. Both might also be used with yjr files from KFX books, but without the capability to extract highlighted text.
Since jhowell's KRDS parser krds.py
https://www.mobileread.com/forums/sh...d.php?t=322172 is general and complete, I've put the details of my partial reverse engineering in spoiler tags.
Spoiler:
As I write this up, I see that the structures are saved avl interval trees, which is meaningless to me and the results of a web search don't look interesting. This particular file is a strange mix of binary and text. (Of course the notes are in text, but see the following.
Each hightlight begins (for my purposes) with the string "annotation.personal.highlight" followed by 4 bytes. The first byte is always 0x03 (^C) followed by 3 bytes that seem to give the length of the following text string that denotes the rawml byte offset of the beginning of the highlight. This is followed by a repeat to give the byte offset of the end of the highlight, which is followed by about a couple dozen bytes of (as far as I am concerned) junk.
Code:
annotation.personal.highlight^C^@^@^G1191325^C^@^@^G1191337^B^@^@^A...
3 0 0 7 3 0 0 7
(0*256) + 0)*256 + 7 = 7
Personal notes are similar to highlights. They begin with the string "annotation.personal.note", followed by the rawml byte offset of the highlight associated with the note. This is followed by more "junk", then binary (only) length of the note, then the text of the note itself.
Bookmarks look similar to highlights, but I have not investigated.
The C code and perl scripts are in github at
https://github.com/jps-e/azw3r and a
ttached here along with a sed script to make the rawml viewable in a web browser.
ETA: The C and perl have been updated
ETA: New release attached as azw3r-0.1.7.zip to this post. See post #29 for details of added features.