Quote:
Originally Posted by nekokami
Wow! Please keep us posted. I'm very interested in this functionality. And I know a Java programmer who might help.
|
Will do. If you could ask your java programmer if there is any java-only way of getting an XML representation of a PDF that would help to direct things. At the moment I am using pdftohtml to do that. pdftohtml is based on xpdf code and I think this is the same as or related to the poppler library. AFAICT these are all C (C++?). I did a quick search to see if there is a java implementation of poppler but could not find anything. Perhaps there is some other java library that can do this, but I don't know enough about java to begin to look around efficiently.
The key thing is that the output of pdftohtml -xml is an xml file with the co-ordinates of each line of text. If there is a way this can be done using java alone then it will be easy to implement the rest and have the whole thing platform independent.
Quote:
Originally Posted by nekokami
I've used R, but I had no idea you could do stuff like this with it.
|
I use R for my work almost every day (not necessarily very deeply) and so I'm more familiar with it than anything else. The R code to do this uses none of the things that makes R special, I'm just using it as a scripting language. I've used R for all sorts of things unrelated to statistics. I suppose one good reason for using R here it is so easy to use the plotting facilities of R to plot the scribbles. This helped with debugging.