The final release version of deskUNPDF Professional is spec'd to perform PDF-HTML conversion, handle pdf-BBeB TOC conversions and internal links, the OCR engine will be enabled for extracting text from images and fixing text from PDFs with non-standard font encodings (all of this is detailed in the readme file). On the pdftohtml->html2lrf solution, I can tell you that deskUNPDF will outperform pdftohtml in creating structured text, paragraphs etc, from PDFs hands down. Besides this, doing an extra conversion (pdf-html-lrf vs pdf-lrf) is always going to me more lossy.
|