Since my last post I have been examining the .annot files on the SONY PRS-700.
I can see that in the case of unencrypted (non-DRM) PDF files, the XML that needs to be extracted is reasonably straightforward. I can see where the ID for "highlight" or "bookmark" is stored, and also where the "hi-lighted text" and/or the "annotation text" is stored.
For other formats, it is not so easy. The BBeB format, for example (as well as having many of the items in different locations) has the "hi-lighted text" in some sort of encryption. So I guess only SONY or ADOBE (or a clever hacker) can write the extractor code for this format.
Clearly an "annotation extractor" (better for some formats that others) can be written. Does anyone know of any activity in this area? (By SONY or anyone else).
|