MobileRead Forums - View Single Post - PDF -> HTML conversion

roffLOL · 10-02-2011, 03:44 PM

Quote:

Originally Posted by user_none

Link?

What is the performance? E.G. Memory and CPU usage? Time to complete? A while back I evaluated a few pure python PDF libraries for replacing the current PDF engine and it found them to be up to 60x slower than the current engine without much gain in terms of quality output. This is partly why a large part of the new PDF engine is being written in C++.

I am not done and have no reliable numbers. However, my current implementation takes 20 seconds to convert a 350 page book and write it to disk (before any optimization). I can't put it in relation to the current implementation of Calibre's converter, since I can't measure how much time Calibre itself spends on parsing the html; however for the same book the conversion takes as long as mine. As for quality, mine IS MUCH better (If you haven't improved it greatly the last couple of months). I haven't yet done any quantitative quality tests. I will run the conversion on nearly 500 PDF's, and will not be satisfied before it can handle all of them nearly perfectly. I am very pedantic about my reading experience.

I will upload the project somewhere, maybe next week or the week after that. When it comes to 'not coding matters', I am a lazy bastard.