View Single Post
Old 10-17-2018, 12:37 AM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 939
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Remove background images from pdfs? perhaps all images?

Hi,

I have a few pdfs I can't read because background images obscure the text. I don't expect any solution for scanned pdfs, but I've tried to find one for pdf-born-pdfs, and been beset with bugs.

In Ghostscript, I've tried:

gs -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]

I don't just lose raster images and vector images, I lose about half the text too. And quick checks confirm it wasn't raster images of text.

I've also tried a 2-step process with:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]

*and then*

gs -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=[output.pdf] [input.pdf]

Now I lose about one-twentieth of the text instead of half, but that's still too much. I usually end up with the lower left corner of the page blown up to fill the whole page.

I've tried using mutool clean -d -l -g, or cpdf with specified page sizes (and -blacktext to avoid white text on white backgrounds), or ghostscript with specified page sizes, but none of these solve the problem.

Any suggestions?
MarjaE is offline   Reply With Quote