View Single Post
Old 02-24-2010, 10:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,423
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It uses a custom pdf to xml engine based on poppler (which is what pdftohtml uses as well). It deals with line wrapping automatically and fixes various shortcoming of pdftohtml like support for the Table of Contents and rotated images.

I'm currently too busy to work on this, so if you want to, be my guest. The code is in calibre/ebooks/pdf

You can invoke it like this

Code:
ebook-convert file.pdf .epub -vvvv --debug-pipeline p --new-pdf-engine
It will error out, but before erroring out, it will create two files in p/input

index.html and index.xml

The XML file is generated by the new engine and the html file is generated from the XML by the code in calibre.ebooks.pdf.reflow

Currently the engine is pretty much done, the code in reflow needs to be completed.
kovidgoyal is offline   Reply With Quote