Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 05-08-2017, 07:11 PM   #1
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 547
Karma: 2928497
Join Date: Mar 2008
Device: Clara 2E & Sage
Converting a PDF to B&W

I recently downloaded a scanned book where all the pages were brown from age. Why this particular book was scanned in color, I don't know. It had no images and no color, other than the browned pages.

I did some experimenting with ImageMagick and here are my results. First, you need Imagemagick and Ghostscript installed (both Open Source). Open a command prompt and from the ImageMagick folder, run the "convert.exe" program.

To convert the PDF to a B&W TIFF, use this command:

convert -density 288 book.pdf -threshold 50% -type bilevel -despeckle -resample 96 book.tif

You can play with the "density" value. The "resample" parameter downsamples the final TIFF to 96 DPI, which is good for on-screen viewing. You can omit it, if you like. Try to keep the density number four times the resample number, when resampling.

To recreate a PDF from the TIFF, using JPEG compression:

convert book.tif -compress jpeg book-bw.pdf

Note that the output PDF name is different from the original, to prevent overwriting.

You can optionally add a "-quality nn" parameter to adjust the JPEG compression.

I had pretty decent results with this. One peculiarity, however. The original color PDF is half the size of my final B&W PDF, using the commands above. I dont' know what compression was used in the original, however.

Note that since you are converting the original PDF to TIFF images, you will lose all OCR'ed text. I OCR'ed it again and all was fine.
jgray is offline   Reply With Quote
Old 05-13-2017, 10:31 AM   #2
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Thank you for a nice how-to post with excellent detail. FYI, you can do this same thing with k2pdfopt, as I described in this recent post. The simplest way is:

k2pdfopt –mode copy –c- -o output.pdf source.pdf

The -c- converts to grayscale. The attached files show an example of this operation--source and output PDF files. The contrast and gamma of the source pages are automatically adjusted, and the output is saved as .PNG inside the PDF (there is an option to save as JPEG--see below).

Advantages: OCR is retained—no need to re-do it. Automatic contrast adjustment is applied.

Drawbacks: If the source PDF uses advanced compression like JPX (as in my example), that compression is lost, and the output file will typically be larger (or lower quality if JPEG with low quality setting is used) compared to the source file.

Some other k2pdfopt options that will impact the output file (complete list here):

-mode trim –n-
Instead of copying the pages exactly as they are, trims off excess white space. The –n- turns off native output so that the grayscale conversion will still occur. This can be combined with –ac option (see below) for pages with lots of scanning artifacts at the edges.

-ac
Autocrop scanned pages—similar to what ScanTailor tries to do on scanned pages with copying artifacts at the edges. Off by default. (This option has been improved in k2pdfopt v2.42.)

-dpi
Set the output resolution in pixels per inch.

-de <pts>
Ignore defects smaller than <pts> in size (1 pt = 1/72 inch). Helps trim and autocrop work better on poor quality scans.

-c
(Or just don’t put –c-) Output in full color.

-jpg <quality>
Write the output in JPEG with the given quality level (1 – 100)

-bpc <nn>
Use <nn> bits per pixel in PNG output (1 to 8 bpc allowed)

-cmax <value>
Set max contrast adjustment. Can be set to 1.0 for no adjustment. Default is 2.0.

-g <gamma>
Set gamma adjustment. Defaults to 0.5. Use 1.0 for no gamma adjustment. The 0.5 value tends to darken the text, which improves its appearance on many e-readers.

-er <n>
Applies “erosion” filter, which tends to thicken text. Default is 0 for the erosion factor (no erosion). Try 1 or 2 at first.

-dw
De-warp scanned pages. (Now available in k2pdfopt v2.42 and up.) Similar to ScanTailor’s de-warping function. When copied book pages aren’t laid flush onto the copying surface, the copy can appear warped. This option tries to undo this.

[PS. I had these options all nicely formated in a table in MS Word and then copied and pasted that into the reply editor on MR and it seemed to take perfectly--showed the table and everything. But then when I previewed the post, it got mangled / undone. Bummer.]
Attached Files
File Type: pdf source.pdf (188.5 KB, 372 views)
File Type: pdf output.pdf (324.1 KB, 374 views)

Last edited by willus; 05-21-2017 at 01:06 PM. Reason: Updated with release of k2pdfopt v2.42
willus is offline   Reply With Quote
Old 05-30-2017, 04:46 PM   #3
desk7
Groupie
desk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notesdesk7 can name that song in three notes
 
Posts: 150
Karma: 24934
Join Date: May 2016
Device: Kindle Paperwhite, Onyx Boox Max
There is a small utility that convert pdf to black&white (NO greyscale) light pdf. It's Aktomat and you can download it from http://apps.kuczynski.pl/

Last edited by desk7; 05-30-2017 at 05:03 PM.
desk7 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Highlighting & notes lost converting to PDF mkhuebner Conversion 1 01-15-2014 03:19 PM
CONVERTING TO PDF MalGordon Android Developer's Corner 0 06-13-2013 03:15 AM
Converting PDF cantona General Discussions 3 06-01-2010 11:53 AM
Mass Converting LIT, RTF, & PDF to ePUB Tom2112 ePub 8 01-11-2010 01:14 AM


All times are GMT -4. The time now is 07:13 PM.


MobileRead.com is a privately owned, operated and funded community.