View Single Post
Old 04-22-2018, 08:23 AM   #1545
Ramo
Enthusiast
Ramo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five wordsRamo can name that ebook in five words
 
Posts: 25
Karma: 37930
Join Date: Mar 2018
Device: Kobo TouchC
Quote:
Originally Posted by willus View Post
As I suspected, the images are stored in JPEG 2000 format (you can see this when you use the k2pdfopt -i option), which taxes most PDF readers significantly more than JPEG or PNG. Moreover, they are 600 dpi--very high res. That is probably why your reader does not like displaying the file--not because of the hidden text. The default k2pdfopt output is PNG ("Flate"), which is much faster to display, but, as you noted, balloons the file size considerably depending on your chosen resolution and color depth. You might try leaving OCR selected (-ocr m) rather than disabling it. I'll bet it will still work fine and you'll then be able to search the document.

There is not a trivial way to simply remove hidden text from a PDF and leave everything else exactly the way it is. I could maybe make it easier to use the method I showed you with a single command-line option to try to intelligently choose the parameters, but in terms of leaving all of the bitmaps in exactly their original format (highly compressed JPEG 2000), I don't have a way to do that.
Thank you! It is not the perfect one-button-solution for all my problems, but now I understand what is happening!

I learned about JPEG 2000 just 2 minutes ago when downloading a set of scanned images from archive.org and failing to make scantailor work on them. Talk about Sincronicity!

Way better suport that I've ever had from any company! You're awesome!

Just out of curiosity, do you have a guess of if KOreader would do a better job with this kind of pdf instead of the Nikel standart software on my Kobo TouchC? And how did you found out about the resolution of the images on the PDF, is there a option to do that on K2PDFopt? I Couldn't find it. And the JPX & JBIG2 on brackets on -i are the file formats of the imagens than?
Ramo is offline   Reply With Quote