View Single Post
Old 02-14-2023, 10:31 PM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by rkomar View Post
I tried a PDF I created earlier where the text pages were in JBIG2 format. After doing the OCR with k2pdfopt, the output PDF was about 10x larger and the images were in some other format. I did not see an option to keep the original images in the output PDF. Is it not possible to do that?

The OCR results seemed pretty good judging by some random searches I tried. I would like to OCR all of the PDFs I created if I could keep them close to the original file size.
Correct--at this time it is not possible for k2pdfopt to just add a text layer but keep the original image formats (I mentioned that in my previous post in this thread). I hope to add that capability at some point (I'm not entirely sure how to code it yet). You can probably get that 10x down to a more reasonable number with strategic selection of the output dpi and number of bits per pixel.

Alternatively, there's a way to use cpdf to move a text layer from one PDF into another, but I don't have access to how I did that until tomorrow. I'll post the solution then.
willus is offline   Reply With Quote