View Single Post
Old 07-06-2012, 11:45 AM   #16
Borodin
Member
Borodin began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jul 2012
Device: Android mobile phone
Quote:
Originally Posted by kovidgoyal View Post
The C++ part was removed, it now uses

pdftohtml -xml

to generate the layout XML. However, I haven't gotten around to migrating the python code that reads the XML and converts it to HTML. That should be fairly simple to do. The python code still expects the old version of the XML, so you will need to change it slightly.
Thank you. I notice the pdftohtml I have is version 0.20 while there is a verison 0.40 available on Sourceforge (I am running Windows 7). Is there a reason for this choice, or is there simply no build of 0.40 for Windows?

Also, do I understand that the Python XML -> HTML code migration is a job that needs doing for the community? In which case I will gladly take it on.
Borodin is offline   Reply With Quote