Not sure if anyone has investigated this, but when I mount the reader, the internal storage has this file:
database/cache/cacheExt.xml
The file looks like it contains all the highlights that have been made.
For example, here is one of them (for a PDF file):
Code:
<text path="Reference/AccuRev_User_CLI.pdf">
<markups>
<annotation date="Sat, 28 Feb 2009 04:05:21 GMT" name="The AccuRev command line interface is implemented by a program named accurev. You can use this tool" page="6" pageOffset="0" pages="256" part="0" scale="0" synced="true">
<end>I3BkZmxvYyg4NzBiLDYsMTY1LDAsMzEsMCwxLDEpAA==</end>
<start>I3BkZmxvYyg4NzBiLDYsMTI3LDAsMCwwLDAsMSkA</start>
<comment>this is a test note</comment>
</annotation>
</markups>
</text>
Most of the tags are pretty straightforward.
- the path to the file (path attribute of text element)
- the time the highlight was made (date attribute of annotation element)
- the first 100 characters of the highlight (name attribute)
- what page of the book, etc
- and of course, the note associated with the highlight, if any. (comment element)
The start and end elements seem to be BASE-64 encoded. Decoded, the text is:
- start: #pdfloc(870b,6,127,0,0,0,0,1)
- end: #pdfloc(870b,6,165,0,31,0,1,1)
the ",6," must be the page of the book (0-indexed). The rest, I'm not sure as of yet. More experimentation needed.
LRF files seem to have a different decoding, into a binary format. For example, "QmViQiAAAAAnPVIHBAAAAAAAAAAW5AIAAAAAAAAAAAA=" turns into,
00000000 42 65 62 42 20 00 00 00 27 3d 52 07 04 00 00 00 |BebB ...'=R.....|
00000010 00 00 00 00 16 e4 02 00 00 00 00 00 00 00 00 00 |................|
Guess this is a start for someone to chew on. I'll probably look into it more as the days/weeks go on.