Hello! I'm new to this so please forgive me if this is basic knowledge.
I have a PDF file which is OCRed. I would like to convert it to epub. The main problem is that I'd like to crop my pdf so I do not have duplicate Headers or Page Numbers in my epub. I have tried first OSX's Preview, then Briss for that. I then tried to run it through calibre epub conversion. Didn'nt work. I then used ghostscript to extract the text:
Code:
gs -sDEVICE=txtwrite -o extractedText%d.txt input.pdf
- but this doesn't work either -still getting all the headers. Although the pdf is clearly cropped, the cropped content did not seem to get deleted permanently.
Then I read on here that
If you run the Briss PDF output through Ghostscript to generate a new PDF, I believe it will permanently get rid of the cropped-out material so that it won't come back in calibre.
This user suggested this command:
Code:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
. And although it does produce a pdf, running it through my first ghostscript command or through the standard calibre conversion is to no avail: Still get the headers & page numbers. I've also tried using different pdfs, just to be sure.
What am I missing here? This can't be so difficult, - can it?