View Single Post
Old 11-08-2015, 10:35 AM   #18
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by PHC View Post
OK, I just did a quick test. I extracted 10 pages from a scanned OCRed PDF using Acrobat...
Please post your source and converted files, the exact command you used, the version of Ghostscript you used, and the OS you did the conversion on. This method was sent to me in an e-mail and I verified that it worked. So "copied" yes, "blindly copied", no. Why would I change something that works? Attached are my examples. Command used (on Windows 10 PC):

Code:
C:\>gs
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GS>quit

C:\>gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=pooh_ghostscript_out.pdf pooh_src.pdf
Resulting PDF information in each file is below. Notice that the bitmap of the page has identical resolution, depth, and encoding method.

Code:
C:\>k2pdfopt -i pooh_src.pdf
k2pdfopt v2.33a (w/MuPDF,DjVuLibre,OCR) (c) 2015, GPLv3, http://willus.com
    Compiled Oct  3 2015 with Gnu C (Mingw64) v5.2.0 for Win64 on x64.

FILE:           pooh_src.pdf
PDF VERSION:    1.3
TITLE:          pooh.pdf
CREATED:        D:20100328132840
LAST MODIFIED:  D:20151108072341-08'00
PDF PRODUCER:   K2pdfopt v2.33a
FILE SIZE:      346.4 kB (354,746 bytes)
PAGES:          1

       Page       Ref           Details
Mediaboxes (1):
        1       (2 0 R):        [ 0 0 455.5 579.4 ] (6.33 x 8.05 in)

Fonts (2):
        1       (2 0 R):        Type1 'Helvetica' (0 0 R)
        1       (2 0 R):        Type1 'Helvetica' (0 0 R)

Images (1):
        1       (2 0 R):        [ Flate ] 949x1207 4bpc DevRGB (5 0 R)


C:\>k2pdfopt -i pooh_ghostscript_out.pdf
k2pdfopt v2.33a (w/MuPDF,DjVuLibre,OCR) (c) 2015, GPLv3, http://willus.com
    Compiled Oct  3 2015 with Gnu C (Mingw64) v5.2.0 for Win64 on x64.

FILE:           pooh_ghostscript_out.pdf
PDF VERSION:    1.4
CREATED:        D:20151108072511-08'00'
LAST MODIFIED:  D:20151108072511-08'00'
PDF PRODUCER:   GPL Ghostscript 9.02
FILE SIZE:      426.0 kB (436,225 bytes)
PAGES:          1

       Page       Ref           Details
Mediaboxes (1):
        1       (4 0 R):        [ 0 0 455.5 579.4 ] (6.33 x 8.05 in)

Fonts (1):
        1       (4 0 R):        Type1 'Helvetica' (9 0 R)

Images (1):
        1       (4 0 R):        [ Flate ] 949x1207 4bpc DevRGB (8 0 R)
PS. I have never argued that cpdf cannot make identical copies of a PDF or that ghostscript is better at it. I originally posted because the OP wanted to remove cropped content, and the method I posted removes cropped-out text (but not cropped-out images). If cpdf can remove cropped-out areas (without help from Acrobat), then please post how, otherwise I consider it irrelevant to why I posted.
Attached Files
File Type: pdf pooh_src.pdf (346.4 KB, 571 views)
File Type: pdf pooh_ghostscript_out.pdf (426.0 KB, 518 views)

Last edited by willus; 11-08-2015 at 10:42 AM.
willus is offline   Reply With Quote