Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 08-07-2011, 01:40 PM   #1
Aephqan
Junior Member
Aephqan began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2011
Device: Kindle
Lightbulb Cropping Pages Permanently with Acrobat Pro

Hi guys,

This is my second post and I wanted to share an information I found lately. I wanted to remove the header and footer but couldn't manage it with Calibre since the header seemed intricate and very variable to me and tried to crop the pages with Acrobat and it seemed that it only hid the info not actually removing it. Lately I found that I can removed the hid info just like this:

"Document -> examine document

There you can delete the hidden information."

Source: http://superuser.com/questions/12756...tly-in-acrobat

It seemed helpful to me and I'm not gonna throw the pdf files that I cannot read on Kindle anymore.

Good luck on that!

(:
Aephqan is offline   Reply With Quote
Old 02-04-2012, 12:59 AM   #2
sinan
Enthusiast
sinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of itsinan has read War And Peace ... all of it
 
sinan's Avatar
 
Posts: 23
Karma: 66956
Join Date: Feb 2010
Location: Conn. USA
Device: Kindle 3, Kindle PW
1. Open your pdf with Adobe Acrobat Pro.

2. Click Tools >> Pages >> Crop
Set margins and crop document. You can use different page range, odd and event page settings.

3. Once you cropped your file Click Tools >> Protection >> Remove hidden information.

4. You will see Status: Finding hidden information, then Results . Once all the hidden information found, you can check/uncheck each group of information.

5. Click Remove.

6. Save your document. You have removed page numbers and headers and other information that you cropped out, permanently.
Note: Dropping cropped pdf into Calibre directly may not yield good results. Especially if you have unicode characters on your pdf. Better option, first convert it html first.

7. Now, lets save our pdf as html before converting it into mobi or epub document. Click File >> Save As >> More Options >> Html Web Page

If you cant save your file as html, make sure you unchecked "Run OCR if needed". For that, click "Settings" on "Save As" screen.

You can do some manual fix befor conversion if you like.

8. Drag and drop html file into Calibre, and set TOC and other stuff and convert.

https://www.mobileread.com/forums/sho...d.php?t=160755
sinan is offline   Reply With Quote
Old 08-10-2014, 06:24 PM   #3
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
@sinan:

Brilliant solution. I have been searching and trying many different CLI tools but this worked where others utterly failed. Thanks!
PHC is offline   Reply With Quote
Old 10-31-2015, 01:34 PM   #4
Ora
Junior Member
Ora can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshes
 
Posts: 3
Karma: 14228
Join Date: Oct 2015
Device: none
I know it's an old thread, but I have an issue with large documents that have a lot of cropped hidden information (where I'm extra interested in getting rid of it, so I'd reduce the size); namely, my Acrobat bugs up halfway through and can't complete the process, claiming that the file size is too big. (I don't remember the exact wording now, but I can check it again.) I'm talking about files 100-200 MB in size, so not gigantic, but pretty big. Any idea what I can do there?
Ora is offline   Reply With Quote
Old 11-01-2015, 12:19 AM   #5
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,312
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Ora View Post
I know it's an old thread, but I have an issue with large documents that have a lot of cropped hidden information ...
You might take a look at this post in the Briss thread, which talks about how to use Ghostscript to permanently remove cropped areas from a PDF.
willus is offline   Reply With Quote
Old 11-06-2015, 01:08 PM   #6
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by willus View Post
You might take a look at this post in the Briss thread, which talks about how to use Ghostscript to permanently remove cropped areas from a PDF.
I'll tell you, using GS for this purpose could be a nightmare. You have to fiddle with so many parameters to get a high quality result, especially on images. And even then it will not be lossless.

A far simpler procedure, that will result in lossless quality, is to split the document into pieces small enough for Acrobat to handle without hanging. I have had it hang when OCRing large documents. This despite the fact that I use an 8-core Xeon Mac Pro with 16GB of memory. Adobe really doesn't seem to understand resource management. So what I do is simply OCR 100 pages at a time and save incrementally. This works fine for OCR. Unfortunately, there is no way to choose page ranges when removing hidden information. It's all or nothing. So you have to split the PDF into manageable size documents.

The best tool I have found for this is cpdf. It is free, multiplatform, and works very well.

Assuming you have successfully cropped the entire document and saved it, to split it into say, 100-page documents:

Code:
cpdf -split "My cropped PDF.pdf" 1 -chunk 100 -o "@F %%%.pdf"
This splits it into files of, at most, 100 pages, named 'My cropped PDF 001.pdf', 'My cropped PDF 002.pdf', …

Now you can open each of them in Acrobat and remove the hidden information. After saving all the files, you can merge them:

Code:
cpdf -merge "My cropped PDF 001.pdf" "My cropped PDF 002.pdf" "My cropped PDF 003.pdf" -o "My cropped PDF clean.pdf"
'My cropped PDF clean.pdf' contains the whole document, cropped, with hidden information removed.

You can change the argument to -chunk to whatever number of pages is necessary to allow Acrobat to successfully remove the data and save it. You will have to experiment.
PHC is offline   Reply With Quote
Old 11-07-2015, 08:07 AM   #7
Ora
Junior Member
Ora can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshesOra can read faster than his screen refreshes
 
Posts: 3
Karma: 14228
Join Date: Oct 2015
Device: none
Yes, that does sound better, since I have never used GS. However, I have a problem with a small document now. It was 13,5 MB, but there were some cropped pages (it's only 14 pages long, but it's a colour scan), so I thought it could be even smaller, cropped the pages a bit more and told my dear Acrobat (X Pro, if it's relevant) to remove hidden information. Just the cropped stuff, without touching the metadata and links.

When I saved the file... it was... 163 MB.

What happened to increase its size over 10 times?! Do you have any idea what I could do?

I still have the original 13,5 MB file, so we can experiment. I've tried removing hidden information again (without the extra cropping, which were just edges of pages and so on), and it got larger again - only 26,2 MB this time, but the thing that should've reduced its size still doubled it. And the strangest thing is that the same Acrobat did manage to reduce size by removing hidden info from some other files, so I'm completely at a loss here.
Ora is offline   Reply With Quote
Old 11-07-2015, 08:40 AM   #8
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,312
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by PHC View Post
I'll tell you, using GS for this purpose could be a nightmare. You have to fiddle with so many parameters to get a high quality result, especially on images. And even then it will not be lossless.
Did you try it? With the device set to pdfwrite, there really aren't any special parameters, and the conversion is indeed lossless. It's not converting to a bitmap. But experimenting with it a little more, I see that while it does do a decent job of eliminating cropped-out text, it does not remove images from the PDF file that are completely cropped out, so the size hardly changes if it is dominated by embedded images, which is disappointing.

[Edit, 9 Nov 2015: My statement above is only true if the source PDF uses lossless encodings for the internal bitmaps. See rest of thread.]

Quote:
Originally Posted by PHC View Post
...The best tool I have found for this is cpdf. It is free, multiplatform, and works very well....
Great tip. I had never heard of cpdf. I will check it out. Thank you.

Last edited by willus; 11-09-2015 at 08:10 AM.
willus is offline   Reply With Quote
Old 11-07-2015, 02:43 PM   #9
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by willus View Post
Did you try it? With the device set to pdfwrite, there really aren't any special parameters, and the conversion is indeed lossless. It's not converting to a bitmap. But experimenting with it a little more, I see that while it does do a decent job of eliminating cropped-out text, it does not remove images from the PDF file that are completely cropped out, so the size hardly changes if it is dominated by embedded images, which is disappointing.
You mean you just used the default settings? You have a LONG way to go before getting anything even approaching quality output. Take a look at this thread: pdftk compression option - Stack Overflow and ps2pdf: PostScript-to-PDF converter for details.

Quote:
Great tip. I had never heard of cpdf. I will check it out. Thank you.
Yes it is a very powerful and easy-to-use command line tool. You could 'extract pages' in Acrobat but it is not worth the bother if you need to do more than a few pages.
PHC is offline   Reply With Quote
Old 11-07-2015, 02:46 PM   #10
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by Ora View Post
Yes, that does sound better, since I have never used GS. However, I have a problem with a small document now. It was 13,5 MB, but there were some cropped pages (it's only 14 pages long, but it's a colour scan), so I thought it could be even smaller, cropped the pages a bit more and told my dear Acrobat (X Pro, if it's relevant) to remove hidden information. Just the cropped stuff, without touching the metadata and links.

When I saved the file... it was... 163 MB.

What happened to increase its size over 10 times?! Do you have any idea what I could do?

I still have the original 13,5 MB file, so we can experiment. I've tried removing hidden information again (without the extra cropping, which were just edges of pages and so on), and it got larger again - only 26,2 MB this time, but the thing that should've reduced its size still doubled it. And the strangest thing is that the same Acrobat did manage to reduce size by removing hidden info from some other files, so I'm completely at a loss here.
I have no idea. I'd have to try it myself. If the file does not contain any sensitive information, you could upload it to a free cloud service and post the link.
PHC is offline   Reply With Quote
Old 11-07-2015, 03:51 PM   #11
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,312
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by PHC View Post
You mean you just used the default settings? You have a LONG way to go before getting anything even approaching quality output. Take a look at this thread: pdftk compression option - Stack Overflow and ps2pdf: PostScript-to-PDF converter for details.
The link I posted lists explicit ghostscript command-line parameters which use the "pdfwrite" output device. I was skeptical like you when I first got the tip, but the conversion creates a perfect replica of the PDF source file except for the removal of the cropped-out text. I've used this conversion many times (in fact, it is integrated into k2pdfopt as an option because of the way it removes cropped-out text so nicely).
Quote:
Originally Posted by PHC View Post
Yes it is a very powerful and easy-to-use command line tool. You could 'extract pages' in Acrobat but it is not worth the bother if you need to do more than a few pages.
This program (cpdf) is awesome. It blows away the java-based equivalents on my Windows-based PC: 20x faster than pdfsam and over 100x faster than jpdftweak. And it easily handles PDFs with thousands of pages. No more java-based PDF tools for me. I hope to post a more complete benchmark at some point.
willus is offline   Reply With Quote
Old 11-07-2015, 09:06 PM   #12
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by willus View Post
The link I posted lists explicit ghostscript command-line parameters which use the "pdfwrite" output device. I was skeptical like you when I first got the tip, but the conversion creates a perfect replica of the PDF source file except for the removal of the cropped-out text. I've used this conversion many times (in fact, it is integrated into k2pdfopt as an option because of the way it removes cropped-out text so nicely).
I you read the post at pdftk compression option - Stack Overflow, you will see that I used even more parameters and got less than lossless results, especially with scanned images. Worst of all, the bookmarks outline (TOC) is removed.

Last edited by PHC; 11-07-2015 at 09:12 PM.
PHC is offline   Reply With Quote
Old 11-07-2015, 09:30 PM   #13
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by willus View Post
This program (cpdf) is awesome.
It really is. It can do a lot and gives lossless results, preserving the TOC and highlighting. Another very useful free tool is PDFtk - The PDF Toolkit. It excels at extracting pages from a PDF, especially if you want pages that are not necessarily contiguous. Acrobat really sucks at that. You can make a list of the pages and page ranges you want, in any order, and extract them to a new PDF in just one line of code. Of course, cpdf can do that too. Another thing both tools can do is add a TOC (bookmarks outline) to a document. The syntax for cpdf is much simpler so that is the best tool for the job. You can completely index a document by simply navigating to a page, copying the text you want as the bookmark title, typing the indentation level, title, and page of the bookmarks in a text file and adding them to the PDF. In Acrobat you have to go to the page, highlight the text, add the bookmark, and adjust the level using the mouse. It takes much longer.
PHC is offline   Reply With Quote
Old 11-07-2015, 09:44 PM   #14
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,312
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by PHC View Post
I you read the post at pdftk compression option - Stack Overflow, you will see that I used even more parameters and got less than lossless results, especially with scanned images. Worst of all, the bookmarks outline (TOC) is removed.
My post was about the parameters I used with gs to remove cropped content from a PDF file, which does a lossless translation of the PDF file (excepting the loss of the TOC, as you also mentioned). This method works and I have verified that it works, as I have already said. The parameters you tried--more, less, or different--have no impact on my method or result, so I'm not sure why you continue to bring this up.
willus is offline   Reply With Quote
Old 11-07-2015, 09:51 PM   #15
PHC
Member
PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.PHC is as sexy as a twisted cruller doughtnut.
 
Posts: 21
Karma: 15000
Join Date: Feb 2014
Device: iPhone, iPad, Macbook Pro, Mac Pro
Quote:
Originally Posted by willus View Post
My post was about the parameters I used with gs to remove cropped content from a PDF file, which does a lossless translation of the PDF file (excepting the loss of the TOC, as you also mentioned). This method works and I have verified that it works, as I have already said. The parameters you tried--more, less, or different--have no impact on my method or result, so I'm not sure why you continue to bring this up.
I bring it up because it re-encodes images and loses the TOC. Both of those are undesirable and eliminated by cpdf. And cpdf is much easier to use.
PHC is offline   Reply With Quote
Reply

Tags
acrobat, crop, cropping pdf, pdf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Adobe Acrobat X Pro pavlli PDF 4 05-13-2011 03:16 AM
Opticbook 3600 pro or standard when using acrobat pro? circularforward Workshop 2 01-29-2010 03:05 AM
Kindle DX and Acrobat Pro Crop Box davidspitzer Amazon Kindle 4 06-15-2009 12:16 PM
acrobat pro 8.0 on the PRS-500 reader ambertape Sony Reader 0 01-21-2008 12:01 PM
Confused with Acrobat Pro and Cropping jmdor Sony Reader 6 03-06-2007 10:44 PM


All times are GMT -4. The time now is 06:36 PM.


MobileRead.com is a privately owned, operated and funded community.