View Single Post
Old 03-29-2011, 09:13 PM   #24
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by kiwidude
the first pages of this at least are converted to imges.
The current PDF engine does not support text under images. This is why you can select it but it's coming through as an image.

Quote:
Originally Posted by kiwidude
I also discovered that it is the cover page that causes the Calibre conversion to run so slowly in this instance
calibre uses pdftohtml to turn the PDF into HTML which it then cleans up and converts. There are easily 100 special processing rules to clean up the HTML from pdftohtml. They are all regular expression based. Most likely those pages are producing very complex and messy output which is causing a large number of rules to be run.

No one has any desire to fix the existing engine. A new PDF engine is in the works but development has stalled. Finishing the new engine would be a better time investment than trying to further work around pdftohtml issues.
user_none is offline   Reply With Quote