Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2019, 01:52 PM   #1
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
azw3r highlight and note extraction info

I've figured out enough of the azw3r format to extract personal highlights, notes, and maybe bookmarks. (All strictly by inspection.) I've also written a C program to extract highlights and notes (in a text format possibly most suitable as an intermediate stage) and a perl script that uses the extracted highlights and notes to mark up the rawml for the book. azw3r.pl is a perl alternative to the C program which takes the same arguments and produces the same output. Both of these can now extract highlighted text from the book's rawml file. Both might also be used with yjr files from KFX books, but without the capability to extract highlighted text.

Since jhowell's KRDS parser krds.py https://www.mobileread.com/forums/sh...d.php?t=322172 is general and complete, I've put the details of my partial reverse engineering in spoiler tags.
Spoiler:

As I write this up, I see that the structures are saved avl interval trees, which is meaningless to me and the results of a web search don't look interesting. This particular file is a strange mix of binary and text. (Of course the notes are in text, but see the following.

Each hightlight begins (for my purposes) with the string "annotation.personal.highlight" followed by 4 bytes. The first byte is always 0x03 (^C) followed by 3 bytes that seem to give the length of the following text string that denotes the rawml byte offset of the beginning of the highlight. This is followed by a repeat to give the byte offset of the end of the highlight, which is followed by about a couple dozen bytes of (as far as I am concerned) junk.
Code:
annotation.personal.highlight^C^@^@^G1191325^C^@^@^G1191337^B^@^@^A...
                              3 0 0 7        3 0 0 7
(0*256) + 0)*256 + 7 = 7

Personal notes are similar to highlights. They begin with the string "annotation.personal.note", followed by the rawml byte offset of the highlight associated with the note. This is followed by more "junk", then binary (only) length of the note, then the text of the note itself.

Bookmarks look similar to highlights, but I have not investigated.

The C code and perl scripts are in github at https://github.com/jps-e/azw3r and a
ttached here along with a sed script to make the rawml viewable in a web browser.

ETA: The C and perl have been updated

ETA: New release attached as azw3r-0.1.7.zip to this post. See post #29 for details of added features.
Attached Files
File Type: gz notes_insert.pl.gz (492 Bytes, 887 views)
File Type: gz unxml.sed.gz (78 Bytes, 1023 views)
File Type: gz azw3r.pl.gz (822 Bytes, 862 views)
File Type: gz azw3r.c.gz (1.0 KB, 906 views)
File Type: zip azw3r-0.1.7.zip (4.2 KB, 760 views)

Last edited by j.p.s; 09-07-2019 at 06:25 PM. Reason: New release 0.1.7
j.p.s is offline   Reply With Quote
Old 07-28-2019, 01:47 PM   #2
NiLuJe
BLAM!
NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.
 
NiLuJe's Avatar
 
Posts: 13,506
Karma: 26047202
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
Not being a Java guy at all, I've always wondered if those (and a few other things) weren't some weird Java binary storage/serialized format...
NiLuJe is offline   Reply With Quote
Old 07-28-2019, 02:06 PM   #3
lumpynose
Wizard
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by NiLuJe View Post
Not being a Java guy at all, I've always wondered if those (and a few other things) weren't some weird Java binary storage/serialized format...
Why Java? (I don't know enough about kindle files to know what the connection might be.)

I would doubt that it's Java serialization since that is rather fragile; a slight change to a class could break compatibility. But for other binary encodings, who knows.
lumpynose is offline   Reply With Quote
Old 07-28-2019, 02:42 PM   #4
NiLuJe
BLAM!
NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.NiLuJe ought to be getting tired of karma fortunes by now.
 
NiLuJe's Avatar
 
Posts: 13,506
Karma: 26047202
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
Because most of the Kindle backend is in Java .
NiLuJe is offline   Reply With Quote
Old 07-28-2019, 04:38 PM   #5
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by NiLuJe View Post
Not being a Java guy at all, I've always wondered if those (and a few other things) weren't some weird Java binary storage/serialized format...
Also not a java guy, and wondered the same thing for the same reasons.
j.p.s is offline   Reply With Quote
Old 08-09-2019, 02:31 PM   #6
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,021
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Quote:
Originally Posted by j.p.s View Post
I've figured out enough of the azw3r format to extract personal highlights, notes, and maybe bookmarks. ...
Just to note that (similar in content to *.azw3r and *.azw3f), for *.kfx books, *.yjr and *.yjf files are created in the *.sdr folder.
PoP is offline   Reply With Quote
Old 08-10-2019, 02:17 AM   #7
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Awesome work! Question though, how do you use this (syntax)? I'm assuming Linux only? Will this work on a Linux LiveUSB?

Thanks!
ilovejedd is offline   Reply With Quote
Old 08-10-2019, 08:41 AM   #8
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,021
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
En passant, I've also been using Kindle Mate to store notes, highlights and vocabulary builder words (but not bookmarks).
PoP is offline   Reply With Quote
Old 08-10-2019, 09:37 AM   #9
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by PoP View Post
En passant, I've also been using Kindle Mate to store notes, highlights and vocabulary builder words (but not bookmarks).
Afaik, that uses "My Clippings.txt" so it only works for highlights created on the e-ink devices. Doesn't work for highlights created via iOS/Android app and synced to e-ink Kindle.
ilovejedd is offline   Reply With Quote
Old 08-10-2019, 12:26 PM   #10
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by ilovejedd View Post
Same sdr folder. Just named mbp1 for MOBI and yjr for KFX
Quote:
Originally Posted by PoP View Post
Just to note that (similar in content to *.azw3r and *.azw3f), for *.kfx books, *.yjr and *.yjf files are created in the *.sdr folder.
Thanks to you both.

And it looks like on older firmware, mbp for MOBI.

I played around with them a bit and their formats for highlights and notes are all different and deserve their own threads. After they are all sorted out, maybe someone can start a thread for an application that automatically handles all of them. In all cases, there is "junk" between the notes header and the text of the note.

I might edit this post later to show excerpts inside spoiler tags.
j.p.s is offline   Reply With Quote
Old 08-10-2019, 01:54 PM   #11
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by ilovejedd View Post
Awesome work! Question though, how do you use this (syntax)? I'm assuming Linux only? Will this work on a Linux LiveUSB?

Thanks!
The C code needs to be compiled. It is not linux specific, but the POSIX mmap() call might not be supported everywhere. It should work fine on a Linux LiveUSB with a C compiler or with a compiled binary copied from a compatible linux system.

Microsoft has been misleadingly claiming POSIX compliance for decades, but it is my understanding that Microsoft Windows Subsystem for linux (or whatever it is called) is the real deal.

If you have a C compiler and it doesn't work, I can make a small change that just reads the entire azw3r file into a buffer since it is very unlikely that one would ever be too large to do that.

Maybe a pythonista will come along and crank out a python equivalent or improvement.

To compile:
Code:
cc -o azw3r azw3r.c
To run (1st is for notes only, 2nd is for highlights only, 3rd for both highlights and notes, and 4th sorts the notes to be by where they are in the book):
Code:
azw3r -i name.azw3r > name.notes
azw3r -h -i name.azw3r > name.highlights
azw3r -h -n -i name.azw3r > name.notes
azw3r -i name.azw3r | sort -n > name.notes
Example output:
Code:
97434   97443   Note:   'Not correct definition for this book.'
114792  114796  Note:   'Should be in x-ray terms category.'
135617  135632  Note:   'Same as Tut'
533488  533494  Note:   'Not a person.'
553723  553726  Note:   'Not a podcast.'
712228  712235  Note:   'Not a video game.'
The output of the C program can be the end product, or it can be processed into some other format or used as input for some other program, such as the perl script attached to the first post that inserts the notes and/or hightlights into the kindleunpack rawml output of the book.
j.p.s is offline   Reply With Quote
Old 08-11-2019, 06:11 PM   #12
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
I've attached a perl script, azw3r.pl as the gzip'd azw3r.pl.gz to the first post. It provides the same functionality as the azw3r.c program. It should run on any platform that has perl installed. Same syntax, e.g.
Code:
azw3r.pl -i name.azw3r > name.notes
or
perl azw3r.pl -i name.azw3r > name.notes
I think I can get something that works for yjr and maybe mbp1, but won't be able to start on it until next weekend.

Last edited by j.p.s; 08-11-2019 at 06:14 PM.
j.p.s is offline   Reply With Quote
Old 08-11-2019, 08:18 PM   #13
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
Update:

The azw3r C program and perl script, as is, work for KFX yjr files for highlights or notes separately, that is using only the -n or -h option and not both at the same time. It is OK for the yjr file have both highlights and notes in it. (The C program works with both at the same time on azw3r files.)

The perl script, unlike the C program, does work fine for listing both highlights and notes at the same time for both yjr and azw3r files.

The perl script notes_insert.pl is not able to process the listings for KFX yjr files.

The perl script azw3r.pl probably does not work for notes longer than 255 characters on any file type. This should be easy to fix.
j.p.s is offline   Reply With Quote
Old 08-13-2019, 03:28 PM   #14
Luca2903
Junior Member
Luca2903 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jul 2019
Device: Kindle PW 2
help please

HI JPS, very interesting work.

Could you please be so kind to try and help me a little bit?

I have this problem here, and I'd like to understand more if your solution is able to help me.

https://www.mobileread.com/forums/sh...44#post3878444

Thanks!
Luca2903 is offline   Reply With Quote
Old 08-13-2019, 06:10 PM   #15
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,833
Karma: 104935873
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by Luca2903 View Post
HI JPS, very interesting work.

Could you please be so kind to try and help me a little bit?

I have this problem here, and I'd like to understand more if your solution is able to help me.

https://www.mobileread.com/forums/sh...44#post3878444

Thanks!
My method does not interact with amazon servers, but extracts notes from the files in the .sdr directories for your books on your kindle to plain text output. Depending on how pretty you want the format of the notes, my method might work for you.

You might also look at jhowell's kindle reader data store KRDS https://www.mobileread.com/forums/sh...d.php?t=322172

Last edited by j.p.s; 08-17-2019 at 01:55 PM. Reason: correct typo word -> work
j.p.s is offline   Reply With Quote
Reply

Tags
azw3r, highlights, highlights and notes, notes

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fully Automated ebook file parsing, ISBN extraction, Titel Extraction and metadata isbnread Reading and Management 0 02-20-2017 10:20 AM
Paperwhite 2 add note without highlight? just_jeepin Amazon Kindle 3 10-07-2013 02:07 PM
PRS-650 Two years late — A crossplatform ePub highlight extraction tool for PRS-350, 650... Syniurge Sony Reader 1 09-30-2013 12:45 PM
eink device with note and highlight sync with Mendeley aldomenguzzi Which one should I buy? 0 12-04-2012 04:44 AM


All times are GMT -4. The time now is 09:45 AM.


MobileRead.com is a privately owned, operated and funded community.