Converting a PDF to B&W
I recently downloaded a scanned book where all the pages were brown from age. Why this particular book was scanned in color, I don't know. It had no images and no color, other than the browned pages.
I did some experimenting with ImageMagick and here are my results. First, you need Imagemagick and Ghostscript installed (both Open Source). Open a command prompt and from the ImageMagick folder, run the "convert.exe" program.
To convert the PDF to a B&W TIFF, use this command:
convert -density 288 book.pdf -threshold 50% -type bilevel -despeckle -resample 96 book.tif
You can play with the "density" value. The "resample" parameter downsamples the final TIFF to 96 DPI, which is good for on-screen viewing. You can omit it, if you like. Try to keep the density number four times the resample number, when resampling.
To recreate a PDF from the TIFF, using JPEG compression:
convert book.tif -compress jpeg book-bw.pdf
Note that the output PDF name is different from the original, to prevent overwriting.
You can optionally add a "-quality nn" parameter to adjust the JPEG compression.
I had pretty decent results with this. One peculiarity, however. The original color PDF is half the size of my final B&W PDF, using the commands above. I dont' know what compression was used in the original, however.
Note that since you are converting the original PDF to TIFF images, you will lose all OCR'ed text. I OCR'ed it again and all was fine.
|