Quote:
Originally Posted by bustacap
thank you for the update but i dont think this part is working quite as intended.
instead of just extracting the images (which should take like a second or two, apart from the epub generation) its re-encoding all of them, which takes quite a bit of time...
|
The process is more complex and time consuming than it might first appear. The plugin can quickly determine whether or not a page contains a single image. It takes a lot more work to then decide whether or not the image reflects what is shown on the page. PDF has a large number of ways in which the rendered page might look a lot different from an image it contains.
The plugin renders each PDF page and then checks that the extracted image closely conforms it before that image will be accepted as a substitute. Right now that comparison process is fairly slow. I intend to optimize it in the future.
Quote:
Originally Posted by bustacap
...and the file size is smaller than the original.
|
It turns out that the image extraction function in the PDF library I use (pypdf) recompresses the extracted image resulting in lowered image quality. I am working on an improved method that will avoid this.