Quote:
Originally Posted by PHC
OK, I just did a quick test. I extracted 10 pages from a scanned OCRed PDF using Acrobat...
|
Please post your source and converted files, the exact command you used, the version of Ghostscript you used, and the OS you did the conversion on. This method was sent to me in an e-mail and I verified that it worked. So "copied" yes, "blindly copied", no. Why would I change something that works? Attached are my examples. Command used (on Windows 10 PC):
Code:
C:\>gs
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GS>quit
C:\>gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=pooh_ghostscript_out.pdf pooh_src.pdf
Resulting PDF information in each file is below. Notice that the bitmap of the page has identical resolution, depth, and encoding method.
Code:
C:\>k2pdfopt -i pooh_src.pdf
k2pdfopt v2.33a (w/MuPDF,DjVuLibre,OCR) (c) 2015, GPLv3, http://willus.com
Compiled Oct 3 2015 with Gnu C (Mingw64) v5.2.0 for Win64 on x64.
FILE: pooh_src.pdf
PDF VERSION: 1.3
TITLE: pooh.pdf
CREATED: D:20100328132840
LAST MODIFIED: D:20151108072341-08'00
PDF PRODUCER: K2pdfopt v2.33a
FILE SIZE: 346.4 kB (354,746 bytes)
PAGES: 1
Page Ref Details
Mediaboxes (1):
1 (2 0 R): [ 0 0 455.5 579.4 ] (6.33 x 8.05 in)
Fonts (2):
1 (2 0 R): Type1 'Helvetica' (0 0 R)
1 (2 0 R): Type1 'Helvetica' (0 0 R)
Images (1):
1 (2 0 R): [ Flate ] 949x1207 4bpc DevRGB (5 0 R)
C:\>k2pdfopt -i pooh_ghostscript_out.pdf
k2pdfopt v2.33a (w/MuPDF,DjVuLibre,OCR) (c) 2015, GPLv3, http://willus.com
Compiled Oct 3 2015 with Gnu C (Mingw64) v5.2.0 for Win64 on x64.
FILE: pooh_ghostscript_out.pdf
PDF VERSION: 1.4
CREATED: D:20151108072511-08'00'
LAST MODIFIED: D:20151108072511-08'00'
PDF PRODUCER: GPL Ghostscript 9.02
FILE SIZE: 426.0 kB (436,225 bytes)
PAGES: 1
Page Ref Details
Mediaboxes (1):
1 (4 0 R): [ 0 0 455.5 579.4 ] (6.33 x 8.05 in)
Fonts (1):
1 (4 0 R): Type1 'Helvetica' (9 0 R)
Images (1):
1 (4 0 R): [ Flate ] 949x1207 4bpc DevRGB (8 0 R)
PS. I have never argued that cpdf cannot make identical copies of a PDF or that ghostscript is better at it. I originally posted because the OP wanted to remove cropped content, and the method I posted removes cropped-out text (but not cropped-out images). If cpdf can remove cropped-out areas (without help from Acrobat), then please post how, otherwise I consider it irrelevant to why I posted.