Great. Let me know if you need help. It would be great to have this worked out and tested for windows. First you'll need to have pdftohtml working for windows. I saw a couple of references to a windows implementation in passing, but did not look at these too carefully. Assuming that is able to run from the command line and can create an xml representation of PDFs then the python script should run fine on windows.
|