View Single Post
Old 07-18-2012, 07:49 AM   #32
rocketdocs
rocketdocs developer
rocketdocs began at the beginning.
 
rocketdocs's Avatar
 
Posts: 5
Karma: 10
Join Date: Jul 2012
Location: Ottawa, Canada
Device: iPad
I understand your skepticism and completely agree with you that there is no other tool on the market that can help you convert a PDF to HTML in a consistent and accurate form.

First, let me make myself perfectly clear, our software doesn't fully automate this process as many others try to do because that's just a futile attempt. We've gone down that road and there are too many variables in play. Also, absolute positioned HTML elements is just a ridiculous notion.

To your point(s) you can't just take a PDF and run it through some magic tool and expect it to spit out perfect HTML every time, but you can use typography techniques and other algorithms to make sense of all the underlying PDF code and create those "semantic units" as you call them. We've got this working to about 80% accuracy already and will only get better over time. The other 20% is using our web-based editor to tell our software what those semantics are (i.e. p, h1, ul, footnote, etc.).

Here's an example of a PDF we converted: http://bit.ly/Nzxrmq. It's 126 pages and it might have taken us a day to convert to the WCAG 2.0 compliant HTML and EPUB you see on that page.
rocketdocs is offline   Reply With Quote