MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 04-21-2018, 02:38 PM

Quote:

Originally Posted by Ramo

Thank you, willus.
Just sent it via PM.

As I suspected, the images are stored in JPEG 2000 format (you can see this when you use the k2pdfopt -i option), which taxes most PDF readers significantly more than JPEG or PNG. Moreover, they are 600 dpi--very high res. That is probably why your reader does not like displaying the file--not because of the hidden text. The default k2pdfopt output is PNG ("Flate"), which is much faster to display, but, as you noted, balloons the file size considerably depending on your chosen resolution and color depth. You might try leaving OCR selected (-ocr m) rather than disabling it. I'll bet it will still work fine and you'll then be able to search the document.

There is not a trivial way to simply remove hidden text from a PDF and leave everything else exactly the way it is. I could maybe make it easier to use the method I showed you with a single command-line option to try to intelligently choose the parameters, but in terms of leaving all of the bitmaps in exactly their original format (highly compressed JPEG 2000), I don't have a way to do that.