View Single Post
Old 03-05-2012, 04:21 PM   #13
tuxor
Addict
tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!tuxor has a thesaurus and is not afraid to use it!
 
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
Okay, since the way I did it seems to work, I will also contribute the small bash script that I wrote to get the png-pdf-version:
Code:
#!/bin/bash
for i in {1..416}
do
   j=$(printf %03d $i)
   pdfimages -j -f $i -l $i $1 __tmpfile
   rm -f __tmpfile*.ppm
   convert -negate __tmpfile*.pbm __tmpimg$j.png
   rm -f __tmpfile*.pbm
   convert __tmpimg$j.png __tmpimg$j.pdf
   rm -f __tmpimg*.png
done
pdftk __tmpimg*.pdf cat output output.pdf
rm -f __tmpimg*.pdf
This script needs the path to the input pdf as argument and will write to "output.pdf" in the working directory. The final pdf will be appx 54MB and the procedure will take really long and use a lot of cpu power. The same script probably won't work with most other pdfs, but there's a good chance it will work with some of the pdfs on archive.org that stem from the same ocr software.

Unfortunately, if you are on windows, there is no way of using this script. But I uploaded the whole converted file and will send the link via pm on request.
tuxor is offline   Reply With Quote