Cropping Pages Permanently with Acrobat Pro

Aephqan · 08-07-2011, 01:40 PM

Hi guys,

This is my second post and I wanted to share an information I found lately. I wanted to remove the header and footer but couldn't manage it with Calibre since the header seemed intricate and very variable to me and tried to crop the pages with Acrobat and it seemed that it only hid the info not actually removing it. Lately I found that I can removed the hid info just like this:

"Document -> examine document

There you can delete the hidden information."

Source: http://superuser.com/questions/12756...tly-in-acrobat

It seemed helpful to me and I'm not gonna throw the pdf files that I cannot read on Kindle anymore.

Good luck on that!

(:

sinan · 02-04-2012, 12:59 AM

1. Open your pdf with Adobe Acrobat Pro.

2. Click Tools >> Pages >> Crop
Set margins and crop document. You can use different page range, odd and event page settings.

3. Once you cropped your file Click Tools >> Protection >> Remove hidden information.

4. You will see Status: Finding hidden information, then Results . Once all the hidden information found, you can check/uncheck each group of information.

5. Click Remove.

6. Save your document. You have removed page numbers and headers and other information that you cropped out, permanently.
Note: Dropping cropped pdf into Calibre directly may not yield good results. Especially if you have unicode characters on your pdf. Better option, first convert it html first.

7. Now, lets save our pdf as html before converting it into mobi or epub document. Click File >> Save As >> More Options >> Html Web Page

If you cant save your file as html, make sure you unchecked "Run OCR if needed". For that, click "Settings" on "Save As" screen.

You can do some manual fix befor conversion if you like.

8. Drag and drop html file into Calibre, and set TOC and other stuff and convert.

https://www.mobileread.com/forums/sho...d.php?t=160755

PHC · 08-10-2014, 06:24 PM

@sinan:

Brilliant solution. I have been searching and trying many different CLI tools but this worked where others utterly failed. Thanks!

Ora · 10-31-2015, 01:34 PM

I know it's an old thread, but I have an issue with large documents that have a lot of cropped hidden information (where I'm extra interested in getting rid of it, so I'd reduce the size); namely, my Acrobat bugs up halfway through and can't complete the process, claiming that the file size is too big. (I don't remember the exact wording now, but I can check it again.) I'm talking about files 100-200 MB in size, so not gigantic, but pretty big. Any idea what I can do there?

willus · 11-01-2015, 12:19 AM

Quote:

Originally Posted by Ora

I know it's an old thread, but I have an issue with large documents that have a lot of cropped hidden information ...

You might take a look at this post in the Briss thread, which talks about how to use Ghostscript to permanently remove cropped areas from a PDF.

PHC · 11-06-2015, 01:08 PM

Quote:

Originally Posted by willus

You might take a look at this post in the Briss thread, which talks about how to use Ghostscript to permanently remove cropped areas from a PDF.

I'll tell you, using GS for this purpose could be a nightmare. You have to fiddle with so many parameters to get a high quality result, especially on images. And even then it will not be lossless.

A far simpler procedure, that will result in lossless quality, is to split the document into pieces small enough for Acrobat to handle without hanging. I have had it hang when OCRing large documents. This despite the fact that I use an 8-core Xeon Mac Pro with 16GB of memory. Adobe really doesn't seem to understand resource management. So what I do is simply OCR 100 pages at a time and save incrementally. This works fine for OCR. Unfortunately, there is no way to choose page ranges when removing hidden information. It's all or nothing. So you have to split the PDF into manageable size documents.

The best tool I have found for this is cpdf. It is free, multiplatform, and works very well.

Assuming you have successfully cropped the entire document and saved it, to split it into say, 100-page documents:

Code:

cpdf -split "My cropped PDF.pdf" 1 -chunk 100 -o "@F %%%.pdf"

This splits it into files of, at most, 100 pages, named 'My cropped PDF 001.pdf', 'My cropped PDF 002.pdf', …

Now you can open each of them in Acrobat and remove the hidden information. After saving all the files, you can merge them:

Code:

cpdf -merge "My cropped PDF 001.pdf" "My cropped PDF 002.pdf" "My cropped PDF 003.pdf" -o "My cropped PDF clean.pdf"

'My cropped PDF clean.pdf' contains the whole document, cropped, with hidden information removed.

You can change the argument to -chunk to whatever number of pages is necessary to allow Acrobat to successfully remove the data and save it. You will have to experiment.

Ora · 11-07-2015, 08:07 AM

Yes, that does sound better, since I have never used GS. However, I have a problem with a small document now. It was 13,5 MB, but there were some cropped pages (it's only 14 pages long, but it's a colour scan), so I thought it could be even smaller, cropped the pages a bit more and told my dear Acrobat (X Pro, if it's relevant) to remove hidden information. Just the cropped stuff, without touching the metadata and links.

When I saved the file... it was... 163 MB.

What happened to increase its size over 10 times?! Do you have any idea what I could do?

I still have the original 13,5 MB file, so we can experiment. I've tried removing hidden information again (without the extra cropping, which were just edges of pages and so on), and it got larger again - only 26,2 MB this time, but the thing that should've reduced its size still doubled it. And the strangest thing is that the same Acrobat did manage to reduce size by removing hidden info from some other files, so I'm completely at a loss here.

willus · 11-07-2015, 08:40 AM

Quote:

Originally Posted by PHC

I'll tell you, using GS for this purpose could be a nightmare. You have to fiddle with so many parameters to get a high quality result, especially on images. And even then it will not be lossless.

Did you try it? With the device set to pdfwrite, there really aren't any special parameters, and the conversion is indeed lossless. It's not converting to a bitmap. But experimenting with it a little more, I see that while it does do a decent job of eliminating cropped-out text, it does not remove images from the PDF file that are completely cropped out, so the size hardly changes if it is dominated by embedded images, which is disappointing.

[Edit, 9 Nov 2015: My statement above is only true if the source PDF uses lossless encodings for the internal bitmaps. See rest of thread.]

Quote:

Originally Posted by PHC

...The best tool I have found for this is cpdf. It is free, multiplatform, and works very well....

Great tip. I had never heard of cpdf. I will check it out. Thank you.

PHC · 11-07-2015, 02:43 PM

Quote:

Originally Posted by willus

Did you try it? With the device set to pdfwrite, there really aren't any special parameters, and the conversion is indeed lossless. It's not converting to a bitmap. But experimenting with it a little more, I see that while it does do a decent job of eliminating cropped-out text, it does not remove images from the PDF file that are completely cropped out, so the size hardly changes if it is dominated by embedded images, which is disappointing.

You mean you just used the default settings? You have a LONG way to go before getting anything even approaching quality output. Take a look at this thread: pdftk compression option - Stack Overflow and ps2pdf: PostScript-to-PDF converter for details.

Quote:

Great tip. I had never heard of cpdf. I will check it out. Thank you.

Yes it is a very powerful and easy-to-use command line tool. You could 'extract pages' in Acrobat but it is not worth the bother if you need to do more than a few pages.

PHC · 11-07-2015, 02:46 PM

Quote:

Originally Posted by Ora

Yes, that does sound better, since I have never used GS. However, I have a problem with a small document now. It was 13,5 MB, but there were some cropped pages (it's only 14 pages long, but it's a colour scan), so I thought it could be even smaller, cropped the pages a bit more and told my dear Acrobat (X Pro, if it's relevant) to remove hidden information. Just the cropped stuff, without touching the metadata and links.

When I saved the file... it was... 163 MB.

What happened to increase its size over 10 times?! Do you have any idea what I could do?

I still have the original 13,5 MB file, so we can experiment. I've tried removing hidden information again (without the extra cropping, which were just edges of pages and so on), and it got larger again - only 26,2 MB this time, but the thing that should've reduced its size still doubled it. And the strangest thing is that the same Acrobat did manage to reduce size by removing hidden info from some other files, so I'm completely at a loss here.

I have no idea. I'd have to try it myself. If the file does not contain any sensitive information, you could upload it to a free cloud service and post the link.

willus · 11-07-2015, 03:51 PM

Quote:

Originally Posted by PHC

You mean you just used the default settings? You have a LONG way to go before getting anything even approaching quality output. Take a look at this thread: pdftk compression option - Stack Overflow and ps2pdf: PostScript-to-PDF converter for details.

The link I posted lists explicit ghostscript command-line parameters which use the "pdfwrite" output device. I was skeptical like you when I first got the tip, but the conversion creates a perfect replica of the PDF source file except for the removal of the cropped-out text. I've used this conversion many times (in fact, it is integrated into k2pdfopt as an option because of the way it removes cropped-out text so nicely).

Quote:

Originally Posted by PHC

Yes it is a very powerful and easy-to-use command line tool. You could 'extract pages' in Acrobat but it is not worth the bother if you need to do more than a few pages.

This program (cpdf) is awesome. It blows away the java-based equivalents on my Windows-based PC: 20x faster than pdfsam and over 100x faster than jpdftweak. And it easily handles PDFs with thousands of pages. No more java-based PDF tools for me. I hope to post a more complete benchmark at some point.

PHC · 11-07-2015, 09:06 PM

Quote:

Originally Posted by willus

The link I posted lists explicit ghostscript command-line parameters which use the "pdfwrite" output device. I was skeptical like you when I first got the tip, but the conversion creates a perfect replica of the PDF source file except for the removal of the cropped-out text. I've used this conversion many times (in fact, it is integrated into k2pdfopt as an option because of the way it removes cropped-out text so nicely).

I you read the post at pdftk compression option - Stack Overflow, you will see that I used even more parameters and got less than lossless results, especially with scanned images. Worst of all, the bookmarks outline (TOC) is removed.

PHC · 11-07-2015, 09:30 PM

Quote:

Originally Posted by willus

This program (cpdf) is awesome.

It really is. It can do a lot and gives lossless results, preserving the TOC and highlighting. Another very useful free tool is PDFtk - The PDF Toolkit. It excels at extracting pages from a PDF, especially if you want pages that are not necessarily contiguous. Acrobat really sucks at that. You can make a list of the pages and page ranges you want, in any order, and extract them to a new PDF in just one line of code. Of course, cpdf can do that too. Another thing both tools can do is add a TOC (bookmarks outline) to a document. The syntax for cpdf is much simpler so that is the best tool for the job. You can completely index a document by simply navigating to a page, copying the text you want as the bookmark title, typing the indentation level, title, and page of the bookmarks in a text file and adding them to the PDF. In Acrobat you have to go to the page, highlight the text, add the bookmark, and adjust the level using the mouse. It takes much longer.

willus · 11-07-2015, 09:44 PM

Quote:

Originally Posted by PHC

I you read the post at pdftk compression option - Stack Overflow, you will see that I used even more parameters and got less than lossless results, especially with scanned images. Worst of all, the bookmarks outline (TOC) is removed.

My post was about the parameters I used with gs to remove cropped content from a PDF file, which does a lossless translation of the PDF file (excepting the loss of the TOC, as you also mentioned). This method works and I have verified that it works, as I have already said. The parameters you tried--more, less, or different--have no impact on my method or result, so I'm not sure why you continue to bring this up.

PHC · 11-07-2015, 09:51 PM

Quote:

Originally Posted by willus

My post was about the parameters I used with gs to remove cropped content from a PDF file, which does a lossless translation of the PDF file (excepting the loss of the TOC, as you also mentioned). This method works and I have verified that it works, as I have already said. The parameters you tried--more, less, or different--have no impact on my method or result, so I'm not sure why you continue to bring this up.

I bring it up because it re-encodes images and loses the TOC. Both of those are undesirable and eliminated by cpdf. And cpdf is much easier to use.

08-07-2011, 01:40 PM	#1
Aephqan Junior Member Posts: 2 Karma: 10 Join Date: Aug 2011 Device: Kindle	Cropping Pages Permanently with Acrobat Pro Hi guys, This is my second post and I wanted to share an information I found lately. I wanted to remove the header and footer but couldn't manage it with Calibre since the header seemed intricate and very variable to me and tried to crop the pages with Acrobat and it seemed that it only hid the info not actually removing it. Lately I found that I can removed the hid info just like this: "Document -> examine document There you can delete the hidden information." Source: http://superuser.com/questions/12756...tly-in-acrobat It seemed helpful to me and I'm not gonna throw the pdf files that I cannot read on Kindle anymore. Good luck on that! (:

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Adobe Acrobat X Pro	pavlli	PDF	4	05-13-2011 03:16 AM
Opticbook 3600 pro or standard when using acrobat pro?	circularforward	Workshop	2	01-29-2010 03:05 AM
Kindle DX and Acrobat Pro Crop Box	davidspitzer	Amazon Kindle	4	06-15-2009 12:16 PM
acrobat pro 8.0 on the PRS-500 reader	ambertape	Sony Reader	0	01-21-2008 12:01 PM
Confused with Acrobat Pro and Cropping	jmdor	Sony Reader	6	03-06-2007 10:44 PM

02-04-2012, 12:59 AM	#2
sinan Enthusiast Posts: 23 Karma: 66956 Join Date: Feb 2010 Location: Conn. USA Device: Kindle 3, Kindle PW	1. Open your pdf with Adobe Acrobat Pro. 2. Click Tools >> Pages >> Crop Set margins and crop document. You can use different page range, odd and event page settings. 3. Once you cropped your file Click Tools >> Protection >> Remove hidden information. 4. You will see Status: Finding hidden information, then Results . Once all the hidden information found, you can check/uncheck each group of information. 5. Click Remove. 6. Save your document. You have removed page numbers and headers and other information that you cropped out, permanently. Note: Dropping cropped pdf into Calibre directly may not yield good results. Especially if you have unicode characters on your pdf. Better option, first convert it html first. 7. Now, lets save our pdf as html before converting it into mobi or epub document. Click File >> Save As >> More Options >> Html Web Page If you cant save your file as html, make sure you unchecked "Run OCR if needed". For that, click "Settings" on "Save As" screen. You can do some manual fix befor conversion if you like. 8. Drag and drop html file into Calibre, and set TOC and other stuff and convert. https://www.mobileread.com/forums/sho...d.php?t=160755

08-10-2014, 06:24 PM	#3
PHC Member Posts: 21 Karma: 15000 Join Date: Feb 2014 Device: iPhone, iPad, Macbook Pro, Mac Pro	@sinan: Brilliant solution. I have been searching and trying many different CLI tools but this worked where others utterly failed. Thanks!

10-31-2015, 01:34 PM	#4
Ora Junior Member Posts: 3 Karma: 14228 Join Date: Oct 2015 Device: none	I know it's an old thread, but I have an issue with large documents that have a lot of cropped hidden information (where I'm extra interested in getting rid of it, so I'd reduce the size); namely, my Acrobat bugs up halfway through and can't complete the process, claiming that the file size is too big. (I don't remember the exact wording now, but I can check it again.) I'm talking about files 100-200 MB in size, so not gigantic, but pretty big. Any idea what I can do there?

11-07-2015, 08:07 AM	#7
Ora Junior Member Posts: 3 Karma: 14228 Join Date: Oct 2015 Device: none	Yes, that does sound better, since I have never used GS. However, I have a problem with a small document now. It was 13,5 MB, but there were some cropped pages (it's only 14 pages long, but it's a colour scan), so I thought it could be even smaller, cropped the pages a bit more and told my dear Acrobat (X Pro, if it's relevant) to remove hidden information. Just the cropped stuff, without touching the metadata and links. When I saved the file... it was... 163 MB. What happened to increase its size over 10 times?! Do you have any idea what I could do? I still have the original 13,5 MB file, so we can experiment. I've tried removing hidden information again (without the extra cropping, which were just edges of pages and so on), and it got larger again - only 26,2 MB this time, but the thing that should've reduced its size still doubled it. And the strangest thing is that the same Acrobat did manage to reduce size by removing hidden info from some other files, so I'm completely at a loss here.

Advert

Advert