MobileRead Forums - View Single Post - iLiad Teasing 2: extract snippets/tag PDFs

daudi · 04-27-2008, 03:00 AM

Quote:

Originally Posted by nekokami

Excellent!
Can we categorize snippets when we capture them? Or afterward, when reviewing them in the HTML version?

I can't think of a way to categorize them as we capture them. One thing that I see as being handy with this approach is that marking-up text for extraction is so simple and quick that it does not interfere with the flow of reading. I guess one possibility might be to link with Rio's tease (on Mac). It should not be hard to use the same approach to mark-up scribbles for extraction. They could then be run through the character recognition software. But that's something that I could not help out with (no mac).

The snippets could, however, easily be edited at a later stage as they have a very simple structure. Here's one:

Quote:

Page: 2 x: 159.204963032--346.022710981 y: 566.062090781--644.585947887
Conclusion?These data con?rm an inverse association between socioeconomic
status and the prevalence of type 2 diabetes in the middle years of life. This

These could easily be edited. It would make sense to have a simple structure to them, e.g. agree that categories should be on the line after the page number. Keeping it simple like this means that it is easy to write awk scripts to process them on the iliad. It would probably be relatively simple to create a small application to handle them on the PC (using python or java). I could imagine something that is able to display the hierarchy of documents on the iliad (mounted on USB or samba or from the CF card from the iliad), and when you open one up you see a entry for each page that has extracted text, and a preview of the text or image and a way to enter categories.

[BTW, in the extract above notice that the hyphen and 'fi' ligature have been converted to '?' because I need help to understand encoding schemes]