View Single Post
Old 04-21-2018, 02:38 PM   #1544
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Ramo View Post
Thank you, willus.
Just sent it via PM.
As I suspected, the images are stored in JPEG 2000 format (you can see this when you use the k2pdfopt -i option), which taxes most PDF readers significantly more than JPEG or PNG. Moreover, they are 600 dpi--very high res. That is probably why your reader does not like displaying the file--not because of the hidden text. The default k2pdfopt output is PNG ("Flate"), which is much faster to display, but, as you noted, balloons the file size considerably depending on your chosen resolution and color depth. You might try leaving OCR selected (-ocr m) rather than disabling it. I'll bet it will still work fine and you'll then be able to search the document.

There is not a trivial way to simply remove hidden text from a PDF and leave everything else exactly the way it is. I could maybe make it easier to use the method I showed you with a single command-line option to try to intelligently choose the parameters, but in terms of leaving all of the bitmaps in exactly their original format (highly compressed JPEG 2000), I don't have a way to do that.
willus is offline   Reply With Quote