Thank you for a nice how-to post with excellent detail. FYI, you can do this same thing with k2pdfopt, as I described in
this recent post. The simplest way is:
k2pdfopt –mode copy –c- -o output.pdf source.pdf
The
-c- converts to grayscale. The attached files show an example of this operation--source and output PDF files. The contrast and gamma of the source pages are automatically adjusted, and the output is saved as .PNG inside the PDF (there is an option to save as JPEG--see below).
Advantages: OCR is retained—no need to re-do it. Automatic contrast adjustment is applied.
Drawbacks: If the source PDF uses advanced compression like JPX (as in my example), that compression is lost, and the output file will typically be larger (or lower quality if JPEG with low quality setting is used) compared to the source file.
Some other k2pdfopt options that will impact the output file (complete list
here):
-mode trim –n-
Instead of copying the pages exactly as they are, trims off excess white space. The –n- turns off native output so that the grayscale conversion will still occur. This can be combined with –ac option (see below) for pages with lots of scanning artifacts at the edges.
-ac
Autocrop scanned pages—similar to what ScanTailor tries to do on scanned pages with copying artifacts at the edges. Off by default. (This option has been improved in k2pdfopt v2.42.)
-dpi
Set the output resolution in pixels per inch.
-de <pts>
Ignore defects smaller than <pts> in size (1 pt = 1/72 inch). Helps trim and autocrop work better on poor quality scans.
-c
(Or just don’t put
–c-) Output in full color.
-jpg <quality>
Write the output in JPEG with the given quality level (1 – 100)
-bpc <nn>
Use <nn> bits per pixel in PNG output (1 to 8 bpc allowed)
-cmax <value>
Set max contrast adjustment. Can be set to 1.0 for no adjustment. Default is 2.0.
-g <gamma>
Set gamma adjustment. Defaults to 0.5. Use 1.0 for no gamma adjustment. The 0.5 value tends to darken the text, which improves its appearance on many e-readers.
-er <n>
Applies “erosion” filter, which tends to thicken text. Default is 0 for the erosion factor (no erosion). Try 1 or 2 at first.
-dw
De-warp scanned pages. (Now available in k2pdfopt v2.42 and up.) Similar to ScanTailor’s de-warping function. When copied book pages aren’t laid flush onto the copying surface, the copy can appear warped. This option tries to undo this.
[PS. I had these options all nicely formated in a table in MS Word and then copied and pasted that into the reply editor on MR and it seemed to take perfectly--showed the table and everything. But then when I previewed the post, it got mangled / undone. Bummer.]