Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 04-12-2016, 04:14 PM   #1
blaenk
Connoisseur
blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.
 
Posts: 53
Karma: 118948
Join Date: Jul 2014
Device: Kindle PaperWhite 3
reverse engineering azw3r?

I've been reading a book for months now and making important highlights as I go, and yesterday I realized that all highlights from the first 60% of the book were gone (I'm at around 80% now).

I loaded up the kindle and I see both the .azw3r file, which AFAIK contains the offsets for the highlights, but I also see an .azw3r.bad_file file alongside it. So somehow it became corrupted? I have no idea.

I opened them both in a hex editor and I definitely see data in both. I have no experience with reverse engineering, but I'm willing to put in the work if it means that I may have a chance to recover those highlights.

Preferably I'd love to merge both, but alternatively I'd love it if I could fix and use the old one since it should have way more highlights, then I can duplicate the book with a different name and give that one the newer .azw3r, then I can manually add the new highlights to the old one.

I guess I'm mainly curious if anyone has any idea what kind of format the file is in. I imagine it should be the same or similar to the other azw3 metadata files? Is it some serialization format similar to protocol buffers, BSON, or Thrift? Or is it some other custom format? And does anyone know the way in which the data is arranged? Are the highlights stored from earliest to latest in the book? Could there be a way to determine why the .bad_file one is considered to be a bad file, in the hopes that I may fix it?

I'm mainly looking for some direction to get started with this, as I've never done this sort of thing. Or if someone knows that it's hopeless then I won't bother

As an aside, the reason these highlights are important to me is that I read through the book making highlights as I go, then I go back and write up notes based on the highlights (i.e. the important bits). This allows me to read at a faster pace than I would if I had to writes notes as I went along. It has worked perfectly for dozens of books that I've read before, I just never imagined that the rug would be pulled from under me like this!
blaenk is offline   Reply With Quote
Old 04-12-2016, 05:47 PM   #2
geekmaster
Carpe diem, c'est la vie.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Hope this helps:
Quote:
KF8 (also called AZW3) is basically a compiled ePub that has been compiled using a Palm database and Amazon's DRM scheme. It is targeting ePub 3 version support but existing ePub can also be used. The latest KindleGen software will create KF8. KF8 replaces MOBI but actually includes both a MOBI database and a KF8 database in the same file for backwards compatibility using older Amazon Kindle readers. ... The internal format has been decoded using kindle unpack. It reveals that the basic format remains as a PDB file similar in structure to MOBI. Generally a KF8 file may contain both a MOBI file at the beginning and the newer KF8 version of ePub later. This of course increased the size of the book file although there is some attempt to share resources such as images between the two objects. Kindle Unpack can be used to separate the two structures by building a traditional MOBI file and a KF8 file without the MOBI part (actually a small dummy structure remains) to make the file smaller. There are flags near the start of the database that can be used to identify the type of file. In spite of the fact that KF8 is targeted at ePub 3 source it still carries forward some HTML constructions that have been de-standardized for years and left over from MOBI's use. Amazon does publish the HTML statements and CSS3 statements that it will recognize. In addition KF8 has extended the ePub 3 format in incompatible ways to support its fixed layout option. All Kindle products beginning with the Kindle 3 can support KF8, although the Kindle 3 requires an update to do so.
Quote:
Kindle unpack (originally called Mobi unpack), also known as Mobi decode, is a python script that creates MOBI or AZW (Amazon Kindle) source files from the compiled database. The filenames used in the source file are not necessarily the same as those that were originally used to create the database as this information is not preserved in the database but an unpacked set of files should be able to be used to recreate the same database using standard mobi or Kindle generating tools. For KF8 files and combined Mobipocket and KF8 files built by KindleGen, it also can produce separated mobipocket and KF8 files, and also the original source files if those are included in the eBook. In addition, for KF8 files it can produce an 'ePub', although if the HTML isn't compliant with ePub standards, the 'ePub' won't be either. For Amazon's .azw4 files, it will extract the PDF that's been wrapped up in Amazon's .azw4 file format. A Calibre plugin version of the scripts is available in this thread.
The trailing 'r' on the file extension means it is a "sidecar" (metadata) file, which in this case may contain your added notes. Here is a related mobileread discussion regarding sidecar (including azw3r) files: https://www.mobileread.com/forums/sho...d.php?t=262356

Last edited by geekmaster; 04-12-2016 at 06:07 PM.
geekmaster is offline   Reply With Quote
Advert
Old 04-12-2016, 07:16 PM   #3
blaenk
Connoisseur
blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.blaenk 's ceiling is 100% spider-free.
 
Posts: 53
Karma: 118948
Join Date: Jul 2014
Device: Kindle PaperWhite 3
Thanks! But the linked thread is one that I created, and unfortunately doesn't contain much information about the metadata file which contains the highlights (azw3r AFAIK).

As for the quotes, thanks as well, but I have a feeling that KF8/kindleunpack is independent of the highlights no? That is, the highlights are stored outside of the actual azw3 book, in the sdr folder's azw3r file.
blaenk is offline   Reply With Quote
Old 04-12-2016, 08:21 PM   #4
knc1
Going Viral
knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.knc1 ought to be getting tired of karma fortunes by now.
 
knc1's Avatar
 
Posts: 17,212
Karma: 18210809
Join Date: Feb 2012
Location: Central Texas
Device: No K1, PW2, KV, KOA
Aren't the highlight's file just offset and length into the text of the book?

If that is the case, then you will have to unpack the book to reach that text content.
I.E: They will not be offset and length into the packed file.
knc1 is offline   Reply With Quote
Old 04-13-2016, 01:27 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I did find references for the older .mbp format (used for storing annotations data for MOBI books).
http://www.angelfire.com/ego2/idleloop/mbp_reader.html

It is possible you can get some idea how it works/worked from there.

As a general rule, lab126 is fond of their byte offsets.
But no one has really devoted time to exploring the wonders of the annotations files....
eschwartz is offline   Reply With Quote
Advert
Old 04-13-2016, 01:56 PM   #6
geekmaster
Carpe diem, c'est la vie.
geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.geekmaster ought to be getting tired of karma fortunes by now.
 
geekmaster's Avatar
 
Posts: 6,433
Karma: 10773668
Join Date: Nov 2011
Location: Multiverse 6627A
Device: K1 to PW3
Here is a kindle highlights fetcher (with source code):

https://rubygems.org/gems/kindle-highlights

https://github.com/speric/kindle-highlights

It may have some code snippets to guide you in your "missing highlights" recovery efforts.

One note of interest regarding this tool:
Quote:
Amazon will sometimes issue a CAPTCHA challenge when logging in to your Kindle account. If this happens when the gem attempts to log in to your Kindle account to retrieve your book list or highlights, you'll get a KindleHighlights::Client::CaptchaError ... There's no way to programmatically resolve this situation. The best solution I've found is to open a browser, visit the URL that the gem returns, log in to your Kindle account, and click around a bit. Then log out of your Kindle account and re-attempt to fetch your highlights via this gem.

Last edited by geekmaster; 04-13-2016 at 02:02 PM.
geekmaster is offline   Reply With Quote
Old 04-17-2016, 03:53 AM   #7
Yourcat
Groupie
Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.Yourcat knows the way to San Jose.
 
Posts: 175
Karma: 54048
Join Date: Mar 2016
Device: PW3 5.6.5-usbnet
You may ask Amazon whether they can fix the file. Hacking the format may take longer than re-reading the book. As you have backups of both files (and hopefully everything else) you could try to rename the bad file to .azw3r and hope that it can be read by your PW after a restart/reboot.
Yourcat is offline   Reply With Quote
Reply

Tags
azw3r


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Reverse Engineering Whispersync Fmstrat Kindle Developer's Corner 3 01-02-2013 03:59 PM
Reverse Engineering Built-In Plugins Agama Plugins 6 01-10-2012 05:35 PM
KDK Reverse Engineering - For Those who Cannot wait... DairyKnight Kindle Developer's Corner 0 05-14-2010 12:29 AM
Reverse-engineering the .IMP format nrapallo IMP 23 02-12-2009 01:44 PM
Introduction to Reverse Engineering Software Colin Dunstan Deals and Resources (No Self-Promotion or Affiliate Links) 0 05-25-2004 11:31 AM


All times are GMT -4. The time now is 12:53 PM.


MobileRead.com is a privately owned, operated and funded community.