How does PDF compression work?

tatagi · 02-28-2023, 01:24 AM

As most of people are well aware, in EPUBs we have all the files(text, images, fonts, metadate, table of contents etc) we need that consist of whole document, in a compressed form. EPUB is basically zipped folder.
This isn't the case for PDF files. PDF is just like word document that puts all elements together in a single file, therefore we can't simply replace image1 with image2 using Window's builtin copy & paste command.

So If you want to compress EPUB file, you just open those image files on any image compression tool found on sourceforge or github repository, and run the compression. Easy-peasy.
Most tools work almost the same : For JPG files, we just decide how lossy the quality of the image can get, mostly useful when the image is very complex(like nature) so human can't easily recognize the difference from the original. . For PNG, instead of compression, the number of colors used is limited (most probably from 24 to 8 bits colors) that works very well to get the desired result(reduced file size) with not very colorful images like scanned text, line art, graphs and the like.

But what about PDF files? Since there's no "cover.png" or "image07.jpg" for the pdf, How they know whch algorithms would work best for each image?
For example, Scanned text is much better in quality in PNG-8bit or sometimes even PNG-1bit form than hard compressed JPG that has unavoidable problem : the artifacts
Can Compression tool apply the most optimized compression method for different images in PDF files?

and if possible, please recommend a good compression tool for PDF and EPUB.

Thanks.

Quoth · 02-28-2023, 06:06 AM

PDF is an envelope that can have different things inside, sometimes only one kind of thing, sometimes layers per page can be different kinds of thing or different pages can be having different internals.
Images can be different kinds of vector, or generated at display time by postscript or raster images. Different kinds of raster images can be encapsulated.

PDF isn't much like an MS Word document, though more like that than an epub.

So there is no single approach to PDFs, it depends what is in them and the quality you want. They need different tools to images inside an epub.

Each PDF page has the size of page and what is in it, which layer it's on and where it is relative to the page.
See k2pdfopt, The GIMP, Imagemagick etc.

02-28-2023, 01:24 AM	#1
tatagi Connoisseur Posts: 52 Karma: 10 Join Date: Oct 2022 Device: none	How does PDF compression work? As most of people are well aware, in EPUBs we have all the files(text, images, fonts, metadate, table of contents etc) we need that consist of whole document, in a compressed form. EPUB is basically zipped folder. This isn't the case for PDF files. PDF is just like word document that puts all elements together in a single file, therefore we can't simply replace image1 with image2 using Window's builtin copy & paste command. So If you want to compress EPUB file, you just open those image files on any image compression tool found on sourceforge or github repository, and run the compression. Easy-peasy. Most tools work almost the same : For JPG files, we just decide how lossy the quality of the image can get, mostly useful when the image is very complex(like nature) so human can't easily recognize the difference from the original. . For PNG, instead of compression, the number of colors used is limited (most probably from 24 to 8 bits colors) that works very well to get the desired result(reduced file size) with not very colorful images like scanned text, line art, graphs and the like. But what about PDF files? Since there's no "cover.png" or "image07.jpg" for the pdf, How they know whch algorithms would work best for each image? For example, Scanned text is much better in quality in PNG-8bit or sometimes even PNG-1bit form than hard compressed JPG that has unavoidable problem : the artifacts Can Compression tool apply the most optimized compression method for different images in PDF files? and if possible, please recommend a good compression tool for PDF and EPUB. Thanks. Last edited by tatagi; 02-28-2023 at 01:28 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Pdf compression options in Ghostscript?	MarjaE	PDF	1	06-15-2019 02:44 PM
PDF to epub: Does this work for you?	birkmaggs	Conversion	7	06-24-2018 06:55 PM
Print to PDF and convert to PDF doesn't work	MadCowCrazy	Conversion	6	04-17-2017 12:28 PM
How to retain the PDF's compression when changing the PDF version?	Raja1205	PDF	0	09-13-2012 01:53 AM
902 PDF Bug 2.0.6: PDF exits back to Home. How can I work around this?	firebelow	PocketBook	3	05-25-2011 11:09 AM

02-28-2023, 06:06 AM	#2
Quoth Still reading Posts: 15,160 Karma: 111120239 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	PDF is an envelope that can have different things inside, sometimes only one kind of thing, sometimes layers per page can be different kinds of thing or different pages can be having different internals. Images can be different kinds of vector, or generated at display time by postscript or raster images. Different kinds of raster images can be encapsulated. PDF isn't much like an MS Word document, though more like that than an epub. So there is no single approach to PDFs, it depends what is in them and the quality you want. They need different tools to images inside an epub. Each PDF page has the size of page and what is in it, which layer it's on and where it is relative to the page. See k2pdfopt, The GIMP, Imagemagick etc.

Advert