MobileRead Forums - View Single Post

kitlaan · 02-28-2009, 12:27 AM

Not sure if anyone has investigated this, but when I mount the reader, the internal storage has this file: database/cache/cacheExt.xml

The file looks like it contains all the highlights that have been made.

For example, here is one of them (for a PDF file):

Code:

<text path="Reference/AccuRev_User_CLI.pdf">
	<markups>
		<annotation date="Sat, 28 Feb 2009 04:05:21 GMT" name="The AccuRev command line interface is implemented by a program named accurev. You can use this tool" page="6" pageOffset="0" pages="256" part="0" scale="0" synced="true">
			<end>I3BkZmxvYyg4NzBiLDYsMTY1LDAsMzEsMCwxLDEpAA==</end>
			<start>I3BkZmxvYyg4NzBiLDYsMTI3LDAsMCwwLDAsMSkA</start>
			<comment>this is a test note</comment>
		</annotation>
	</markups>
</text>

Most of the tags are pretty straightforward.

the path to the file (path attribute of text element)
the time the highlight was made (date attribute of annotation element)
the first 100 characters of the highlight (name attribute)
what page of the book, etc
and of course, the note associated with the highlight, if any. (comment element)

The start and end elements seem to be BASE-64 encoded. Decoded, the text is:

start: #pdfloc(870b,6,127,0,0,0,0,1)
end: #pdfloc(870b,6,165,0,31,0,1,1)

the ",6," must be the page of the book (0-indexed). The rest, I'm not sure as of yet. More experimentation needed.

LRF files seem to have a different decoding, into a binary format. For example, "QmViQiAAAAAnPVIHBAAAAAAAAAAW5AIAAAAAAAAAAAA=" turns into,

00000000 42 65 62 42 20 00 00 00 27 3d 52 07 04 00 00 00 |BebB ...'=R.....|
00000010 00 00 00 00 16 e4 02 00 00 00 00 00 00 00 00 00 |................|

Guess this is a start for someone to chew on. I'll probably look into it more as the days/weeks go on.

02-28-2009, 12:27 AM	#27
kitlaan Junior Member Posts: 4 Karma: 18 Join Date: Nov 2008 Device: PRS-700	database/cache/cacheExt.xml ? Not sure if anyone has investigated this, but when I mount the reader, the internal storage has this file: database/cache/cacheExt.xml The file looks like it contains all the highlights that have been made. For example, here is one of them (for a PDF file): Code: <text path="Reference/AccuRev_User_CLI.pdf"> <markups> <annotation date="Sat, 28 Feb 2009 04:05:21 GMT" name="The AccuRev command line interface is implemented by a program named accurev. You can use this tool" page="6" pageOffset="0" pages="256" part="0" scale="0" synced="true"> <end>I3BkZmxvYyg4NzBiLDYsMTY1LDAsMzEsMCwxLDEpAA==</end> <start>I3BkZmxvYyg4NzBiLDYsMTI3LDAsMCwwLDAsMSkA</start> <comment>this is a test note</comment> </annotation> </markups> </text> Most of the tags are pretty straightforward. the path to the file (path attribute of text element) the time the highlight was made (date attribute of annotation element) the first 100 characters of the highlight (name attribute) what page of the book, etc and of course, the note associated with the highlight, if any. (comment element) The start and end elements seem to be BASE-64 encoded. Decoded, the text is: start: #pdfloc(870b,6,127,0,0,0,0,1) end: #pdfloc(870b,6,165,0,31,0,1,1) the ",6," must be the page of the book (0-indexed). The rest, I'm not sure as of yet. More experimentation needed. LRF files seem to have a different decoding, into a binary format. For example, "QmViQiAAAAAnPVIHBAAAAAAAAAAW5AIAAAAAAAAAAAA=" turns into, 00000000 42 65 62 42 20 00 00 00 27 3d 52 07 04 00 00 00 \|BebB ...'=R.....\| 00000010 00 00 00 00 16 e4 02 00 00 00 00 00 00 00 00 00 \|................\| Guess this is a start for someone to chew on. I'll probably look into it more as the days/weeks go on.