Converting a PDF to B&W

jgray · 05-08-2017, 07:11 PM

I recently downloaded a scanned book where all the pages were brown from age. Why this particular book was scanned in color, I don't know. It had no images and no color, other than the browned pages.

I did some experimenting with ImageMagick and here are my results. First, you need Imagemagick and Ghostscript installed (both Open Source). Open a command prompt and from the ImageMagick folder, run the "convert.exe" program.

To convert the PDF to a B&W TIFF, use this command:

convert -density 288 book.pdf -threshold 50% -type bilevel -despeckle -resample 96 book.tif

You can play with the "density" value. The "resample" parameter downsamples the final TIFF to 96 DPI, which is good for on-screen viewing. You can omit it, if you like. Try to keep the density number four times the resample number, when resampling.

To recreate a PDF from the TIFF, using JPEG compression:

convert book.tif -compress jpeg book-bw.pdf

Note that the output PDF name is different from the original, to prevent overwriting.

You can optionally add a "-quality nn" parameter to adjust the JPEG compression.

I had pretty decent results with this. One peculiarity, however. The original color PDF is half the size of my final B&W PDF, using the commands above. I dont' know what compression was used in the original, however.

Note that since you are converting the original PDF to TIFF images, you will lose all OCR'ed text. I OCR'ed it again and all was fine.

willus · 05-13-2017, 10:31 AM

Thank you for a nice how-to post with excellent detail. FYI, you can do this same thing with k2pdfopt, as I described in this recent post. The simplest way is:

k2pdfopt –mode copy –c- -o output.pdf source.pdf

The -c- converts to grayscale. The attached files show an example of this operation--source and output PDF files. The contrast and gamma of the source pages are automatically adjusted, and the output is saved as .PNG inside the PDF (there is an option to save as JPEG--see below).

Advantages: OCR is retained—no need to re-do it. Automatic contrast adjustment is applied.

Drawbacks: If the source PDF uses advanced compression like JPX (as in my example), that compression is lost, and the output file will typically be larger (or lower quality if JPEG with low quality setting is used) compared to the source file.

Some other k2pdfopt options that will impact the output file (complete list here):

-mode trim –n-
Instead of copying the pages exactly as they are, trims off excess white space. The –n- turns off native output so that the grayscale conversion will still occur. This can be combined with –ac option (see below) for pages with lots of scanning artifacts at the edges.

-ac
Autocrop scanned pages—similar to what ScanTailor tries to do on scanned pages with copying artifacts at the edges. Off by default. (This option has been improved in k2pdfopt v2.42.)

-dpi
Set the output resolution in pixels per inch.

-de <pts>
Ignore defects smaller than <pts> in size (1 pt = 1/72 inch). Helps trim and autocrop work better on poor quality scans.

-c
(Or just don’t put –c-) Output in full color.

-jpg <quality>
Write the output in JPEG with the given quality level (1 – 100)

-bpc <nn>
Use <nn> bits per pixel in PNG output (1 to 8 bpc allowed)

-cmax <value>
Set max contrast adjustment. Can be set to 1.0 for no adjustment. Default is 2.0.

-g <gamma>
Set gamma adjustment. Defaults to 0.5. Use 1.0 for no gamma adjustment. The 0.5 value tends to darken the text, which improves its appearance on many e-readers.

-er <n>
Applies “erosion” filter, which tends to thicken text. Default is 0 for the erosion factor (no erosion). Try 1 or 2 at first.

-dw
De-warp scanned pages. (Now available in k2pdfopt v2.42 and up.) Similar to ScanTailor’s de-warping function. When copied book pages aren’t laid flush onto the copying surface, the copy can appear warped. This option tries to undo this.

[PS. I had these options all nicely formated in a table in MS Word and then copied and pasted that into the reply editor on MR and it seemed to take perfectly--showed the table and everything. But then when I previewed the post, it got mangled / undone. Bummer.]

desk7 · 05-30-2017, 04:46 PM

There is a small utility that convert pdf to black&white (NO greyscale) light pdf. It's Aktomat and you can download it from http://apps.kuczynski.pl/

05-08-2017, 07:11 PM	#1
jgray Fanatic Posts: 547 Karma: 2928497 Join Date: Mar 2008 Device: Clara 2E & Sage	Converting a PDF to B&W I recently downloaded a scanned book where all the pages were brown from age. Why this particular book was scanned in color, I don't know. It had no images and no color, other than the browned pages. I did some experimenting with ImageMagick and here are my results. First, you need Imagemagick and Ghostscript installed (both Open Source). Open a command prompt and from the ImageMagick folder, run the "convert.exe" program. To convert the PDF to a B&W TIFF, use this command: convert -density 288 book.pdf -threshold 50% -type bilevel -despeckle -resample 96 book.tif You can play with the "density" value. The "resample" parameter downsamples the final TIFF to 96 DPI, which is good for on-screen viewing. You can omit it, if you like. Try to keep the density number four times the resample number, when resampling. To recreate a PDF from the TIFF, using JPEG compression: convert book.tif -compress jpeg book-bw.pdf Note that the output PDF name is different from the original, to prevent overwriting. You can optionally add a "-quality nn" parameter to adjust the JPEG compression. I had pretty decent results with this. One peculiarity, however. The original color PDF is half the size of my final B&W PDF, using the commands above. I dont' know what compression was used in the original, however. Note that since you are converting the original PDF to TIFF images, you will lose all OCR'ed text. I OCR'ed it again and all was fine.

05-30-2017, 04:46 PM	#3
desk7 Groupie Posts: 150 Karma: 24934 Join Date: May 2016 Device: Kindle Paperwhite, Onyx Boox Max	There is a small utility that convert pdf to black&white (NO greyscale) light pdf. It's Aktomat and you can download it from http://apps.kuczynski.pl/ Last edited by desk7; 05-30-2017 at 05:03 PM.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Highlighting & notes lost converting to PDF	mkhuebner	Conversion	1	01-15-2014 03:19 PM
CONVERTING TO PDF	MalGordon	Android Developer's Corner	0	06-13-2013 03:15 AM
Converting PDF	cantona	General Discussions	3	06-01-2010 11:53 AM
Mass Converting LIT, RTF, & PDF to ePUB	Tom2112	ePub	8	01-11-2010 01:14 AM