Remove background images from pdfs? perhaps all images?
Hi,
I have a few pdfs I can't read because background images obscure the text. I don't expect any solution for scanned pdfs, but I've tried to find one for pdf-born-pdfs, and been beset with bugs.
In Ghostscript, I've tried:
gs -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]
I don't just lose raster images and vector images, I lose about half the text too. And quick checks confirm it wasn't raster images of text.
I've also tried a 2-step process with:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]
*and then*
gs -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]
Now I lose about one-twentieth of the text instead of half, but that's still too much. I usually end up with the lower left corner of the page blown up to fill the whole page.
I've tried using mutool clean -d -l -g, or cpdf with specified page sizes (and -blacktext to avoid white text on white backgrounds), or ghostscript with specified page sizes, but none of these solve the problem.
Any suggestions?
|