Quote:
Originally Posted by ectoplasm
This is actually pretty sweet for automatically cropping text based PDF page margins. This is the first tool I found that does this automatically. If there are others, please comment. I'm not interested in the programs where you have to select a region by hand.
|
But if our PDF is image with text layer in the background we should be very much interested, because often we should first crop such PDF in Briss or PdfScissors, A-Pdf page crop etc. and then and only then use soPdf or k2pdfopt for much better result.
So it is 2 or 3 step process for PDF image.
1. Quick OCR-ing by Abyy, Acrobat etc. because there is usually no need for a great OCR behind the image.
2. Cropping roughly by Briss, eliminating headers/footers if needed (soPdf removes headers/footers like page numbers automatically).
3. Cropping in soPdf or k2pdfopt.
Often k2pdfopt should be enough as standalone (i.e. 1 step process) though, even for pure image (non OCR-ed).
With soPdf OCR layer stays there after cropping and PDF is about the same size i.e no rasterization involved that makes PDF bigger as with k2pdfopt.
Example:
1st picture is original, 8 pages of scanned pdf OCR-ed.
2nd picture is that original croped by Briss (just roughly i.e. not getting very close to the text proper but headers cropped)
3d picture is original cropped by briss and then cropped additionally in soPdf (to fit hight).
4th picture is original cropped in soPdf directly.
1
2
3
4 -click on a picture to enlarge view
As we can see soPdf didn't cut those two left margins on two pages (4th picture) when directly applied, whereas after cropping in Briss soPdf cropped those two margins correctly and we eliminated headers/footers by Briss also.
Briss and soPdf or k2pdfopt are complementary because usually there are pages that stick out in Briss (inch or half of an inch from stacked majority on odd or even pages) and we can freely include them all for cropping if we are to use soPdf or k2pdfopt after Briss for very precise cropping.