MobileRead Forums - View Single Post

jhowell · 08-12-2025, 04:55 PM

Quote:

Originally Posted by bustacap

thank you for the update but i dont think this part is working quite as intended.
instead of just extracting the images (which should take like a second or two, apart from the epub generation) its re-encoding all of them, which takes quite a bit of time...

The process is more complex and time consuming than it might first appear. The plugin can quickly determine whether or not a page contains a single image. It takes a lot more work to then decide whether or not the image reflects what is shown on the page. PDF has a large number of ways in which the rendered page might look a lot different from an image it contains.

The plugin renders each PDF page and then checks that the extracted image closely conforms it before that image will be accepted as a substitute. Right now that comparison process is fairly slow. I intend to optimize it in the future.

Quote:

Originally Posted by bustacap

...and the file size is smaller than the original.

It turns out that the image extraction function in the PDF library I use (pypdf) recompresses the extracted image resulting in lowered image quality. I am working on an improved method that will avoid this.