View Single Post
Old 09-04-2019, 08:28 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
In order to pre-process image files with scantailor, I may have to convert some source PDF to png files.

There are some online services that do this, I prefer doing it using imagemagick.
Good choice.

Quote:
Originally Posted by roger64 View Post
Code:
convert garnier.pdf garnier.png
convert: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `garnier.png' @ warning/png.c/MagickPNGWarningHandler/1748.
That warning can probably be completely ignored.

From what I could tell, what's happening is that ICC (color) metadata from the PDF is being embedded in the PNG... (see technical note below).

If you want the warning to go away, and don't care about the metadata, just add a -strip:

Code:
convert -strip garnier.pdf garnier.png
You could continue to add whatever other adjustments you want:

Code:
convert -density 300 -strip garnier.pdf garnier.png
You could also remove the transparency and make the background white:

Code:
convert -density 300 -strip garnier.pdf -background white -alpha off garnier.png
or even use the mogrify command instead:

Code:
mogrify -format png -density 300 -strip -background white -alpha off garnier.pdf
Side Note: For more info on mogrify and batch processing, see the ol' IMv6 Basic Usage (mogrify).

Quote:
Originally Posted by roger64 View Post
It converted nearly instantly all the pages which is pretty good but I am not sure to understand the information above. Has somebody some knowledge about it?
Technical Note: I tested a PDF on my end, and got a similar "RGB color space not permitted" error. When I used:

Code:
identify -verbose output.png
on it and compared the stripped/unstripped PNGs, this was the chunks of metadata that -strip removed:

Spoiler:
Code:
  Resolution: 300x300
  Print size: 8.5x11
  [...]
    icc:copyright: Copyright Artifex Software 2011
    icc:description: Artifex Software sRGB ICC Profile
    pdf:Version: PDF-1.5 
	[...]
    png:bKGD: chunk was found (see Background color, above)
    png:pHYs: x_res=300, y_res=300, units=0
    png:text: 4 tEXt/zTXt/iTXt chunks were found
    png:text-encoded profiles: 1 were found
    png:tIME: 2019-09-04T23:45:09Z
	[...]
  Profiles:
    Profile-icc: 2576 bytes
	[...]


I assume the few icc lines were what ImageMagick was warning about.

The PNG itself says it's grayscale, but the embedded ICC metadata within the PNG was trying to say it was some sort of sRGB.

Probably carryovers from the PDF metadata when the original person generated/scanned those in.

Quote:
Originally Posted by roger64 View Post
Even adding parameters like -quality 100, or -density 300, one such image has a 27k only size, while the same image processed with, say pdfcandy online service at medium resolution has a 55k size (see screenshot). Does this difference may hinder the ocr process later?
... who knows what kinds of commands they run on that online service. With ImageMagick, you control the entire workflow.

And every PDF is going to be different, so you may need to do different kinds of tweaks for different things (DPI, speckling cleanup, etc.).

ImageMagick Note: PNG is lossless... so -quality on PNG only changes how much compression it's running on the file.

JPG is lossy, so -quality is a sliding scale from 1-100 on how hideous you want the images to be.

ImageMagick's page on -quality for more info.

Last edited by Tex2002ans; 09-04-2019 at 08:52 PM.
Tex2002ans is offline   Reply With Quote