Cool, it was always in the back of my mind to write a script to implement column detection and a few other goodies form the output of pdf2xml, but I never found the time/motivation.
I'll be willing to integrate this into calibre (after the 0.6 release), so open a ticket and attch your script. Integration will depend on how easy it is to compile pdf2xml on various platforms.
|