Quote:
Originally Posted by Rémi Ozene
@ab78727
5) I listened the following video on YouTube ("Ghostscript - Convert PDF to black and white using https://www.youtube.com/watch?v=JtEvCWJX5qw&t=56s
) which proposes the following script to transform a color PDF file into black and white using Ghostscript:
gs \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
-dPDFSETTINGS=/default \
-sOutputFile=output.pdf \
input.pdf
Since you seem to know how to use Ghostscript, could you or another kind soul explain to me, from A to Z, how to use it with Google Colab ( https://colab.research.google.com/ ), in order to create a notebook there, (without having to install Python on my computer), the goal being to obtain, as a result, a lightening of PDF files with dark backgrounds from Archive.org. ? What exact set of commands do I need to write in order for this script to run, for example, if my file is called dickens.pdf?
|
I tried the following on colab.research.google.com and it seemed to retrieve the file and run gs on it (I just gave it some basic arguments and not the ones that you had specified):
Code:
!apt-get install ghostscript
import requests
urlpfx = "https://archive.org/download/"
name = input("Enter a name: ")
outname = input("Enter output file name: ")
url = f"{urlpfx}/{name}"
print(f"retrieving {url}")
response = requests.get(url)
print(response)
if response.status_code == 200:
with open(f"{outname}.pdf", "wb") as f:
f.write(response.content)
!gs -sDEVICE=pdfwrite -o {outname}_new.pdf {outname}.pdf
print(f"File '{outname}_new.pdf' created successfully.")
else:
print(f"Failed to download the file. Status code: {response.status_code}")
if you'd prefer, I can share the colab notebook link directly with you. The method of use is to pass in the last part of the archive.org url (e.g. for "https://archive.org/download/{title}/{pdfname}.pdf", enter "{title}/{pdfname}") as "name" and then the desired output file name as the second prompt. The script downloads the file from archive.org and if successful, run gs on it with the specified arguments. The resulting output file is created as "{outputname}_new.pdf" in the "content" directory. I hope this helps.