MobileRead Forums - View Single Post

willus · 02-14-2023, 10:31 PM

Quote:

Originally Posted by rkomar

I tried a PDF I created earlier where the text pages were in JBIG2 format. After doing the OCR with k2pdfopt, the output PDF was about 10x larger and the images were in some other format. I did not see an option to keep the original images in the output PDF. Is it not possible to do that?

The OCR results seemed pretty good judging by some random searches I tried. I would like to OCR all of the PDFs I created if I could keep them close to the original file size.

Correct--at this time it is not possible for k2pdfopt to just add a text layer but keep the original image formats (I mentioned that in my previous post in this thread). I hope to add that capability at some point (I'm not entirely sure how to code it yet). You can probably get that 10x down to a more reasonable number with strategic selection of the output dpi and number of bits per pixel.

Alternatively, there's a way to use cpdf to move a text layer from one PDF into another, but I don't have access to how I did that until tomorrow. I'll post the solution then.