Quote:
Originally Posted by kovidgoyal
The C++ part was removed, it now uses
pdftohtml -xml
to generate the layout XML. However, I haven't gotten around to migrating the python code that reads the XML and converts it to HTML. That should be fairly simple to do. The python code still expects the old version of the XML, so you will need to change it slightly.
|
Thank you. I notice the pdftohtml I have is version 0.20 while there is a verison 0.40 available on Sourceforge (I am running Windows 7). Is there a reason for this choice, or is there simply no build of 0.40 for Windows?
Also, do I understand that the Python XML -> HTML code migration is a job that needs doing for the community? In which case I will gladly take it on.