View Single Post
Old 09-03-2020, 05:34 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Splice PDF: A Script to improve readability by separating images from text

I've written a script to help with my pdf issues. Written for the bash shell in the MacOS Automator so it may require tweaks for other software.

The idea is to split each pdf in 3 parts and then splice them back together-- the cover, which I've rasterized, the images from each page, again rasterized, and the text from each page, blackened and inserted after the images. This makes it easier for me to read the text, and makes it easier for the Kindle to handle the images regardless how they've been constructed. It breaks tables of contents.

P.S. This does not work with scanned pdfs. I'd suggest using k2pdfopt -mode copy for that.

I've also written a varient with -dev dx after each k2pdfopt -mode copy, and with different output file names, for a grayscale output optimized for the Kindle Dx.

By default K2 increases contrast, so if you prefer not to, that's another tweak.

It requires Ghostscript, Cpdf, K2pdfopt, and Qpdf. Cpdf should be free for non-commercial use, but I'd still prefer an open source alternative to it, and it's no longer available via Homebrew.

I've installed k2pdfopt to ~/Applications and I've installed the others using Homebrew.

Each app seems to have slightly inconsistent standards for standard output and standard input. In the end, I instructed each one to export a set filename to a "Splice" folder, or import a set filename from there. I've been able to run the whole sequence that way, first splitting, then processing, and then splicing the pdf back together.

I haven't replaced all the older code where it used ` instead of (), maybe eventually.

for f in "$@"
do
# Copy and Rasterize 1st page from source pdf using k2pdfopt
~/Applications/k2pdfopt -ui -mode copy -p 1 -x -o "/Users/Marja/Splice/RGBCover_copy.pdf" "$f" $@
# Copy text from source pdf file using Ghostscript, turn text black using Cpdf
# The color conversion strategy should help with the 2nd stage if I switch to Ghostscript
# - and -_ indicate standard output and input
# Due to compatibility issues, dumping to ~/Splice/Text.pdf
/usr/local/bin/gs -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dCompatibilityLevel=1.4 -sColorConversionStrategy=RGB -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile="/Users/Marja/Splice/Text.pdf" "$f" &&
/usr/local/bin/cpdf "/Users/Marja/Splice/Text.pdf" -blacktext -o "/Users/Marja/Splice/Blacktext.pdf"
# Copy images from same source pdf file using Ghostscript, rasterize images using K2pdfopt
# Due to compatibility issues, dumping to ~/Splice/Images.pdf
/usr/local/bin/gs -sDEVICE=pdfimage24 -dFILTERTEXT -dCompatibilityLevel=1.4\
-g800x1080 -r150 -dPDFFitPage \
-sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile="/Users/Marja/Splice/Images.pdf" "$f" &&
~/Applications/k2pdfopt -ui -mode copy -x -o "/Users/Marja/Splice/RGBImages_copy.pdf" "/Users/Marja/Splice/Images.pdf" $@ &&
# Splice files using qpdf
suffix="-SplicedColor.pdf"
base=`basename "$f" .pdf`
outputfile=$base$suffix
/usr/local/bin/qpdf --collate "/Users/Marja/Splice/RGBCover_copy.pdf" --pages "/Users/Marja/Splice/RGBCover_copy.pdf" "/Users/Marja/Splice/RGBImages_copy.pdf" "/Users/Marja/Splice/Blacktext.pdf" -- "$outputfile"
done

Last edited by MarjaE; 09-03-2020 at 05:45 PM.
MarjaE is offline   Reply With Quote