View Full Version : Auto crop raster image PDF - any software?


bthoven
03-18-2010, 12:42 AM
Not sure this question has been discussed.

I have quite a number of PDF files which are not text; created by scanning from real paper document (legal documents). The page size is too big for viewing comfortably on my Nook.

As the page format is consistent on every page, is it possible to automatically crop each page by specifying the area we want to keep? Which software is able to do it easily?

I attach my sample document for your view.

ATimson
03-18-2010, 10:09 AM
It's not free software, but IIRC Adobe Acrobat Professional (and possibly Standard - certainly not Adobe Reader, though) can do what you want.

CoolDragon
03-18-2010, 04:13 PM
For scanned PDF file cropping:

Non-free software: Acrobat Professional, Foxit PDF Editor etc etc

Free: Try this http://code.activestate.com/recipes/576837-crop-pdf-file-with-pypdf/

bthoven
03-18-2010, 10:23 PM
Free: Try this http://code.activestate.com/recipes/576837-crop-pdf-file-with-pypdf/

Hi,

Thanks for the suggestion.

I've followed the above link and I'm not sure about the margin parameters. What is the unit of the margin parameter? For example:

pdf-crop.py" -m "120 50 120 100" -i mypdf.pdf

What is the unit of 120 50 120 100?

CoolDragon
03-19-2010, 12:17 AM
Actually I don't know the unit. But I always start from 10 and adjust afterwards.

frabjous
03-19-2010, 12:30 AM
I haven't tried pdf-crop.py, but you can do something similar to pdf-crop.py using the pdfmanipulate command line program that comes with calibre (http://calibre-ebook.com/). The command is this:

pdfmanipulate crop -o "Myfile-cropped.pdf" -x 72 -y 72 -w 72 -v 72 "Myfile.pdf"

Where the filename following -o is what the file will be saved as, the filename at the end is the input file name, and the -x, -y, -w and -v are the number of pixels you want to crop from the left, bottom, right and top, respectively. (I hope that's right... it might not be, I haven't checked.)

Typically, there are 72 pixels per inch.

In windows you may need to put in the full path to pdfmanipulate, i.e.:

32 bit Windows:
"C:\Program Files\Calibre2\pdfmanipulate.exe" crop -o "Myfile-cropped.pdf" -x 72 -y 72 -w 72 -v 72 "Myfile.pdf"

64 bit Windows:
"C:\Program Files (x86)\Calibre2\pdfmanipulate.exe" crop -o "Myfile-cropped.pdf" -x 72 -y 72 -w 72 -v 72 "Myfile.pdf"

I posted more detailed instructions in this thread (http://www.mobileread.com/forums/showthread.php?p=776523) for using this for cropping all the PDFs in a folder at once with a batch file for Windows and Linux.

I gave instructions both for manually setting the dimensions to crop, and for using Ghostscript to auto-calculate the amount to crop (though that wouldn't work so well for scanned PDFs unless they're exceptionally clean).

You could also try PaperCrop (http://www.mobileread.com/forums/showthread.php?t=31677) and PDFLRF (http://www.mobileread.com/forums/showthread.php?t=13135), which work well more or less automatically with scanned documents. (For the latter, you could use calibre to convert lrf to epub or whatever afterwards.)

Do not use Acrobat for this. Acrobat does not actually crop files. It just pretends to. I.e., it inserts a command to tell its viewer and Adobe Reader to ignore parts of the margins. But these commands are often ignored by reader software, which may well be true of the Nook.

eksor
03-23-2010, 04:54 AM
Not sure this question has been discussed.

I have quite a number of PDF files which are not text; created by scanning from real paper document (legal documents). The page size is too big for viewing comfortably on my Nook.

As the page format is consistent on every page, is it possible to automatically crop each page by specifying the area we want to keep? Which software is able to do it easily?

I attach my sample document for your view.

Hi:

If you are trying to process scanned documents I think that the best way to handle them is filtering thru scan2pdf or scantailor (thanks frabjous for this). See this thread for a discussion on this topic.

Another poster in this forum (sorry, I can't remember who) suggest to convert them to lrf (sony propietary format, i know you have a nook) and then again to pdf with calibre. This is because the auto croping and spiting feature of pdflrf. It tries to remove the margins and split the pages where there is no text.

Finally, I think that the command line tool suggested by frabjous is perhaps the most useful way of cropping efficiently.

Regards

bthoven
03-23-2010, 05:14 AM
Thanks a lot frabjous and eksor.

I've tried the pdfmanipulate.exe in Calibre. It works!

The correct margin parameters are:

x = left
v = right
w = top
y = bottom

Regarding the unit, if I want to cut the margin by 1 inch, I have to specify 72.

Thanks again.

slex
03-25-2010, 06:25 AM
If you use Linux you can try pdfshuffler. It is a GUI tool and you can see the portion of the page you crop.

bthoven
03-25-2010, 06:51 AM
If you use Linux you can try pdfshuffler. It is a GUI tool and you can see the portion of the page you crop.
Wow! That's really great, wysiwyg!.:thumbsup:

Any similar Windows version around?

slex
03-26-2010, 08:49 PM
Wow! That's really great, wysiwyg!.:thumbsup:

Any similar Windows version around?

http://www.pdfill.com/pdf_tools_free.html

I am not sure that it crops image pdfs, however.

bthoven
03-27-2010, 06:02 AM
Hi frabjous
pdfmanipilate crop also pretend as if the doc were cropped. When I view the cropped file on Nook with Small font setting, my nook display the cropped pages; but to my surprise, my Nook displays full pages when set font size to medium or big. I'm fine with this because the cropped file size is not bigger than the original file.

frabjous
03-27-2010, 10:29 AM
Hi frabjous
pdfmanipilate crop also pretend as if the doc were cropped.

I don't know what's going on with your Nook, but this is not true.

When I view the cropped file on Nook with Small font setting, my nook display the cropped pages; but to my surprise, my Nook displays full pages when set font size to medium or big. I'm fine with this because the cropped file size is not bigger than the original file.

Please note that this thread is about *rasterized* PDFs, not text-based PDFs. There are no "fonts" involved whose sizes can be changed. I really don't know what Nook does when you change the font size with raster PDFs, since I don't have a Nook, but it certainly might need to change the size of the bounding box in order to get the proportions right for the nook.

bthoven
03-27-2010, 11:07 AM
yes..I'm talking about rasterized pdf; otherwise, I'll use sopdf instead.

Nook actually use Adobe ADE reader to display pdf. If the pdf is text, then changing font size to bigger ones will start to reflow the text.

I would confirm that my file is a rasterized one; and I was really surprised when I saw my cropped pdf displayed full page as if it were not cropped, when I set Nook font size to smaller, medium, bigger, or biggest.

In Nook, setting font size to small will display pdf in its original form, in other words, in what you intend your pdf file to be displayed (in this case, display only the cropped area). Setting font size to others, ie, smaller, medium, bigger, or biggest will display quite unpredictable result (in this case, display its before-cropped pages).:D

mh445
07-24-2010, 04:07 AM
The thread is older, but for cropping rasterized (scanned) PDFs, there is this nice little piece of software: http://sourceforge.net/projects/briss/

eksor
07-30-2010, 05:04 AM
The thread is older, but for cropping rasterized (scanned) PDFs, there is this nice little piece of software: http://sourceforge.net/projects/briss/

Really nice, many thanks!

chefguru
09-07-2010, 02:55 PM
I've had a similar problem with needing to crop multiple pages. I bought a mobile scanner to scan a bunch of sheet music. The program auto-rotates the PDFs so the lines are straight, which is a great feature. The down side is that when it rotates the image, it leaves black borders, and because it turns the actual page, the final PDF isn't always 8.5x11, sometimes it may be up to an inch wider and taller. If I wanted to reprint those pages, I'm going to have black triangles in the page.

I've tried a few of the programs other people mentioned here (PDFill being the most suggested to me), and if I was just doing what you're looking for, to strip the same amount of border off of each page, then it would be fine, but I needed more.

Because each page I have is possibly rotated in different areas, I can't always apply the same cropping rule to every page at once. but A-PDF Page Crop goes one step further than the other programs mentioned here.

You can do the same thing with all the programs, and set a border for every page, and auto-crop that much, but this is the only program I've found that lets you have an actual box on the screen that you can manipulate and move before applying the crop area. You can do each page separately, or just do the first page and apply the same rules to all the other pages you're editing.

It's got a lot of other features too, but that's the one that I use every week, and I think you might find it a little bit easier to use that some of the others listed here.

http://www.a-pdf.com/page-crop/download.htm

Give it a shot, I don't think you'll be disappointed.