View Single Post
Old 02-27-2009, 11:27 PM   #27
kitlaan
Junior Member
kitlaan began at the beginning.
 
Posts: 4
Karma: 18
Join Date: Nov 2008
Device: PRS-700
database/cache/cacheExt.xml ?

Not sure if anyone has investigated this, but when I mount the reader, the internal storage has this file: database/cache/cacheExt.xml

The file looks like it contains all the highlights that have been made.

For example, here is one of them (for a PDF file):
Code:
<text path="Reference/AccuRev_User_CLI.pdf">
	<markups>
		<annotation date="Sat, 28 Feb 2009 04:05:21 GMT" name="The AccuRev command line interface is implemented by a program named accurev. You can use this tool" page="6" pageOffset="0" pages="256" part="0" scale="0" synced="true">
			<end>I3BkZmxvYyg4NzBiLDYsMTY1LDAsMzEsMCwxLDEpAA==</end>
			<start>I3BkZmxvYyg4NzBiLDYsMTI3LDAsMCwwLDAsMSkA</start>
			<comment>this is a test note</comment>
		</annotation>
	</markups>
</text>
Most of the tags are pretty straightforward.
  • the path to the file (path attribute of text element)
  • the time the highlight was made (date attribute of annotation element)
  • the first 100 characters of the highlight (name attribute)
  • what page of the book, etc
  • and of course, the note associated with the highlight, if any. (comment element)

The start and end elements seem to be BASE-64 encoded. Decoded, the text is:
  • start: #pdfloc(870b,6,127,0,0,0,0,1)
  • end: #pdfloc(870b,6,165,0,31,0,1,1)

the ",6," must be the page of the book (0-indexed). The rest, I'm not sure as of yet. More experimentation needed.


LRF files seem to have a different decoding, into a binary format. For example, "QmViQiAAAAAnPVIHBAAAAAAAAAAW5AIAAAAAAAAAAAA=" turns into,

00000000 42 65 62 42 20 00 00 00 27 3d 52 07 04 00 00 00 |BebB ...'=R.....|
00000010 00 00 00 00 16 e4 02 00 00 00 00 00 00 00 00 00 |................|



Guess this is a start for someone to chew on. I'll probably look into it more as the days/weeks go on.
kitlaan is offline   Reply With Quote