View Full Version : "Cleansing" of PDF files
02-17-2010, 03:14 PM
I would like to do this to correct the HUGE security hole that exists, so i do not accidentally pass on a file with a bug/virus embedded in a PDF. (it can be as simple as a 'tracker' or as sophisticated as a 'phone home w/ user email', etc).
I am currently using okular (i use a KDE desktop) set with heavy restrictions to open the file initially, and if an AcroForm is present, i then re-open the file with PDFedit to delete the form (where there are no restrictions set on the PDF file). this is rather tedious, to say the least.
and being the lazy person i am, i was hoping that others will have solved
this problem for me !
and if anyone knows a good tool to remove the restrictions (batch would be nice...) on PDF files, that would also make things easier that PDF --> lrf --> PDF !
thank you, in advance, for any help/pointers,
02-17-2010, 03:37 PM
The command line tool pdftk (http://www.accesspdf.com/pdftk/) should be able to remove the forms via "flatten".
I'm sure there are better things to try, though.
02-17-2010, 04:19 PM
(Actually, unless you're already familiar with iText, that looks a bit intimidating... well... take your pick.)
02-18-2010, 12:15 PM
first thank you for the speedy response.
after looking at pdfstripper.jar, it does not look like the tool i was hoping for. i would have to rename my before i run it, and rename it after it is run. it appears to be a wrapper around the itext class library, that is then in turn run by a wrapper around a java tool (ant). i was hoping for something a little 'cleaner' (i.e. <command> <input pdf> <output pdf> ).
(this is more of a philosophical thing. i am the lead architect on a 300k plus line java-based open source project...)
i am wondering if 'pdftk flatten' is the correct tool, as it has not been updated in quite some time (nov 2006) , and might not be 'script aware' in most cases.
looks like my choices are (in no order)
convert to a better 'print ready' format, and back again.
bite the bullet and write a program that uses the itext class myself.
(in all my copious free time not spent changing diapers, babysitting engineers - no, they are not the ones in diapers, and launching a startup.)
be the one to write the bash script that uses pdftk to flatten --> filter out java script using sed/awk/etc --> re-compress
and on a timely note, saw this article referenced on slashdot a few hours after i posted my plea for help:
Rogue PDFs account for 80% of all exploits, says researcher (http://www.computerworld.com/s/article/9157438/Rogue_PDFs_account_for_80_of_all_exploits_says_res earcher)
from the article:
Computerworld - Just hours before Adobe is slated to deliver the latest patches for its popular PDF viewer, a security firm announced that by its counting, malicious Reader documents made up 80% of all exploits at the end of 2009.
According to ScanSafe of San Bruno, Calif., vulnerabilities in Adobe's Reader and Acrobat applications were the most frequently targeted of any software during 2009, with hackers' PDF exploits growing throughout the year.
In the first quarter of 2009, malicious PDF files made up 56% of all exploits tracked by ScanSafe. That figure climbed above 60% in the second quarter, over 70% in the third and finished at 80% in the fourth quarter.
looks like i need this tool more than ever !
02-18-2010, 03:27 PM
Couldn't you automate the renaming and the rest of it with a bash script (or python/perl/sh, etc.)?
Well, if you do decide to write something yourself, let us know!
The other methods I mentioned are probably good enough for my purposes.
What is this open source project if I may be nosy?