View Single Post
Old 08-12-2025, 04:55 PM   #1063
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,121
Karma: 92500001
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
Quote:
Originally Posted by bustacap View Post
thank you for the update but i dont think this part is working quite as intended.
instead of just extracting the images (which should take like a second or two, apart from the epub generation) its re-encoding all of them, which takes quite a bit of time...
The process is more complex and time consuming than it might first appear. The plugin can quickly determine whether or not a page contains a single image. It takes a lot more work to then decide whether or not the image reflects what is shown on the page. PDF has a large number of ways in which the rendered page might look a lot different from an image it contains.

The plugin renders each PDF page and then checks that the extracted image closely conforms it before that image will be accepted as a substitute. Right now that comparison process is fairly slow. I intend to optimize it in the future.

Quote:
Originally Posted by bustacap View Post
...and the file size is smaller than the original.
It turns out that the image extraction function in the PDF library I use (pypdf) recompresses the extracted image resulting in lowered image quality. I am working on an improved method that will avoid this.
jhowell is offline   Reply With Quote