Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-17-2010, 03:14 PM   #1
psychomike
Junior Member
psychomike began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2008
Location: Silicon Valley
Device: PRS-700, G1
"Cleansing" of PDF files

hello all,

before i sit down and create the tool myself, i was wondering if anyone has discovered a batch-input tool (command line would be nice, folder/directory scan, if not...) that will auto-magically strip out any AcroForm/Javascript embedded in a PDF ?

I would like to do this to correct the HUGE security hole that exists, so i do not accidentally pass on a file with a bug/virus embedded in a PDF. (it can be as simple as a 'tracker' or as sophisticated as a 'phone home w/ user email', etc).

I am currently using okular (i use a KDE desktop) set with heavy restrictions to open the file initially, and if an AcroForm is present, i then re-open the file with PDFedit to delete the form (where there are no restrictions set on the PDF file). this is rather tedious, to say the least.

and being the lazy person i am, i was hoping that others will have solved
this problem for me !

and if anyone knows a good tool to remove the restrictions (batch would be nice...) on PDF files, that would also make things easier that PDF --> lrf --> PDF !

thank you, in advance, for any help/pointers,

-michael
psychomike is offline   Reply With Quote
Old 02-17-2010, 03:37 PM   #2
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
The command line tool pdftk should be able to remove the forms via "flatten".

Not sure if it can be used to remove javascript. Hmm.... personally I'd probably try one of the following two things, but only because I don't know any better: use the pdflatex pdfpages to include the pdf in a "new one", which I'm pretty sure (not positive) would be stripped of its javascript in the process (which could certainly be scripted from the commandline for batch processing), or use the ghostscript commands pdf2ps and ps2pdf to convert from pdf to ps and back again, which I think would have the effect of removing the javascript (and preserve a lot more than converting to lrf would!), and both could be put in, e.g., a bash script easy enough.

I'm sure there are better things to try, though.

Do you have a PDF with javascript in it I can test with?

(EDIT: I tested both methods with the javascript calculator PDF here and both successfully broke the calculator, but I'm not sure whether or not any javascript was left or not.)

(EDIT 2: I uncompressed the results of both methods with pdftk and examined the results and didn't see any javascript in either, but I'm not the most competent judge.)

(EDIT 3: Someone cleverer than I could probably use pdftk to uncompress the PDF then use a command line text/stream editor like sed or awk to strip the javascript then recompress.)

Last edited by frabjous; 02-17-2010 at 04:12 PM.
frabjous is offline   Reply With Quote
Old 02-17-2010, 04:19 PM   #3
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Aha!

Even better: PDF Java Script stripper -- a Java program; should be platform-independent.

(Actually, unless you're already familiar with iText, that looks a bit intimidating... well... take your pick.)

Last edited by frabjous; 02-17-2010 at 04:29 PM.
frabjous is offline   Reply With Quote
Old 02-18-2010, 12:15 PM   #4
psychomike
Junior Member
psychomike began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2008
Location: Silicon Valley
Device: PRS-700, G1
frabjous,

first thank you for the speedy response.

after looking at pdfstripper.jar, it does not look like the tool i was hoping for. i would have to rename my before i run it, and rename it after it is run. it appears to be a wrapper around the itext class library, that is then in turn run by a wrapper around a java tool (ant). i was hoping for something a little 'cleaner' (i.e. <command> <input pdf> <output pdf> ).

(this is more of a philosophical thing. i am the lead architect on a 300k plus line java-based open source project...)

i am wondering if 'pdftk flatten' is the correct tool, as it has not been updated in quite some time (nov 2006) , and might not be 'script aware' in most cases.

looks like my choices are (in no order)
  1. convert to a better 'print ready' format, and back again.
  2. bite the bullet and write a program that uses the itext class myself.
    (in all my copious free time not spent changing diapers, babysitting engineers - no, they are not the ones in diapers, and launching a startup.)
  3. be the one to write the bash script that uses pdftk to flatten --> filter out java script using sed/awk/etc --> re-compress


*sigh*

and on a timely note, saw this article referenced on slashdot a few hours after i posted my plea for help:
from the article:
Computerworld - Just hours before Adobe is slated to deliver the latest patches for its popular PDF viewer, a security firm announced that by its counting, malicious Reader documents made up 80% of all exploits at the end of 2009.

According to ScanSafe of San Bruno, Calif., vulnerabilities in Adobe's Reader and Acrobat applications were the most frequently targeted of any software during 2009, with hackers' PDF exploits growing throughout the year.

In the first quarter of 2009, malicious PDF files made up 56% of all exploits tracked by ScanSafe. That figure climbed above 60% in the second quarter, over 70% in the third and finished at 80% in the fourth quarter.

looks like i need this tool more than ever !

-michael
psychomike is offline   Reply With Quote
Old 02-18-2010, 03:27 PM   #5
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Couldn't you automate the renaming and the rest of it with a bash script (or python/perl/sh, etc.)?

Well, if you do decide to write something yourself, let us know!

The other methods I mentioned are probably good enough for my purposes.

What is this open source project if I may be nosy?
frabjous is offline   Reply With Quote
Reply

Tags
drm removal software, pdf conversion, pdf ebooks, pdf metadata, pdf password recovery

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple "Copy to Library" not copying covers/opf files over? Trickery Calibre 9 10-08-2010 01:18 PM
Problem "saving to disk" pdf files lucone Calibre 1 06-28-2010 05:29 AM
Commercial program says it can "make your own pdf e-books" - Anyone know about " Fugubot PDF 3 04-29-2009 06:39 PM
I need info on the DR1000s for "searching with in" .pdf files cs2501 iRex 11 12-25-2008 04:22 AM
"Secure" PDF and "Secure" Mobi docs? AceHarddrive iRex 9 05-08-2008 09:13 PM


All times are GMT -4. The time now is 07:37 AM.


MobileRead.com is a privately owned, operated and funded community.