View Single Post
Old 12-11-2021, 12:03 PM   #5
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Found it: You must first decompress the PDF:

Code:
mutool.exe clean -d -a original.pdf original.decompressed.pdf
This worked to replace the two string with empty strings:

Code:
# -*- coding: latin-1 -*-

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.pdf import ContentStream
from PyPDF2.generic import TextStringObject, NameObject
from PyPDF2.utils import b_
 
string1 = "Licence  blah"
string2 = "blah blah blah"

# Load PDF for reading
source = PdfFileReader("original.decompressed.pdf")
output = PdfFileWriter()
 
# Iterating through each page
for page in range(source.getNumPages()):
	# Current Page
	print("Handling page ",page)
	page = source.getPage(page)
	content_object = page["/Contents"].getObject()
	content = ContentStream(content_object, source)
	# Iterating over all pdf elements on current page
	for operands, operator in content.operations:
		if operator == b_("Tj"):
			print("Found")
			text = operands[0]
			if isinstance(text, TextStringObject) and (text.startswith(string1) or text.startswith(string2)):
				print("Replace")
				operands[0] = TextStringObject("")
	page.__setitem__(NameObject("/Contents"), content)
	output.addPage(page)
 
outputStream = open("output.decompressed.pdf", "wb")
output.write(outputStream)
The recompressed file is ~7x bigger than the original; I also tried a couple more tools (qpdf and cpdf), but they barely did anything:

Code:
mutool.exe convert -O compress -o recompressed.pdf output.decompressed.pdf
---
Edit: GhostScript is fast and gives an even slightly smaller file than the original:

Code:
gswin32c.exe -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Last edited by Shohreh; 12-11-2021 at 05:30 PM.
Shohreh is offline   Reply With Quote