Post processing scanned PDF's

Aesys · 10-02-2017, 12:28 PM

Hi

I got recommended to post this on mobile read, so hopefully you lovely peeps can help.

Couple of questions, that actually converge in regards to post processing.

1 - I downloaded a load of out of copyright books from Archive.org.
See an example at https://archive.org/details/receiptbook00rolf
In the above example, the book has been photographed and it's paper colour is included within the PDF. When I scan my own books, my Brother MFC scanner has a background removal feature, that does a great job, but I can't find an equivalent for things already scanned.
Foxit PhantomPDF has a 'convert to' function, but with the old fonts and lower scan quality, screws things up big time.

2 - There have been a couple of items and books that I have destructively scanned through said Brother MFC printer scanner. The only options on it's scan is either black white or colour. However when you have a black white printed book and nearly every page has a colour image, the whole scan needs to be colour, which over a couple of hundred pages increases the size of the PDF.
I remember an old scanner which had a black / white and colour setting, but not this new one.

3 - The brother MFC printer scanner, scans strictly A4 or A3. So if a book or document is a little smaller, there are edge markings / gaps. I have spoken to Brother and the non cropping is a feature of the device (lol), not a setting to change. Any recommendations on how I post process these.

I have Foxit Phantom PDF, which does a great job for OCR and re-sizing, but I cannot seem to convert books to black white, or get it to change the spec to black white & colour, or even crop a pdf.
I am aware that Photoshop can do these tasks manually, from an image extract of the pdf, but how do I automate a variable process, especially when there are a couple of hundred pages?

Is there other process's or software that can be used. I have the above, but am unwilling to spend on Acrobat or other software until I know my issues 'will' be resolved, and of course free or cheaper solutions are preferred.

I currently bootcamp Windows as my default OS, so am happy to triple boot Linux, but some strict instructions are necessary, as last time I got involved in command line, I had to re-install my Windows partition.

I would like to non-destructively photo scan a number of my books, but the above is worrying me in spending time trying to resolve, before I invest time and money building a book scanner.

Ta

DB Link for scanned example I would like to clean.
https://www.dropbox.com/s/87fpnx96lg...ement.pdf?dl=0

orebmur · 10-02-2017, 01:10 PM

If you never before heard of scantailor, it is about time.
Great software, enabled me to successfully split and clean up some scanned PDF files.
Check out github.com/scantailor/scantailor/wiki for documentation and some screenshots.

Aesys · 10-03-2017, 03:42 PM

Quote:

Originally Posted by orebmur

If you never before heard of scantailor, it is about time.
Great software, enabled me to successfully split and clean up some scanned PDF files.
Check out github.com/scantailor/scantailor/wiki for documentation and some screenshots.

Many thanks, will be looking at trying this, but looks perfect.

10-02-2017, 12:28 PM	#1
Aesys Junior Member Posts: 3 Karma: 10 Join Date: Oct 2017 Device: Google Books / Kindle App	Post processing scanned PDF's Hi I got recommended to post this on mobile read, so hopefully you lovely peeps can help. Couple of questions, that actually converge in regards to post processing. 1 - I downloaded a load of out of copyright books from Archive.org. See an example at https://archive.org/details/receiptbook00rolf In the above example, the book has been photographed and it's paper colour is included within the PDF. When I scan my own books, my Brother MFC scanner has a background removal feature, that does a great job, but I can't find an equivalent for things already scanned. Foxit PhantomPDF has a 'convert to' function, but with the old fonts and lower scan quality, screws things up big time. 2 - There have been a couple of items and books that I have destructively scanned through said Brother MFC printer scanner. The only options on it's scan is either black white or colour. However when you have a black white printed book and nearly every page has a colour image, the whole scan needs to be colour, which over a couple of hundred pages increases the size of the PDF. I remember an old scanner which had a black / white and colour setting, but not this new one. 3 - The brother MFC printer scanner, scans strictly A4 or A3. So if a book or document is a little smaller, there are edge markings / gaps. I have spoken to Brother and the non cropping is a feature of the device (lol), not a setting to change. Any recommendations on how I post process these. I have Foxit Phantom PDF, which does a great job for OCR and re-sizing, but I cannot seem to convert books to black white, or get it to change the spec to black white & colour, or even crop a pdf. I am aware that Photoshop can do these tasks manually, from an image extract of the pdf, but how do I automate a variable process, especially when there are a couple of hundred pages? Is there other process's or software that can be used. I have the above, but am unwilling to spend on Acrobat or other software until I know my issues 'will' be resolved, and of course free or cheaper solutions are preferred. I currently bootcamp Windows as my default OS, so am happy to triple boot Linux, but some strict instructions are necessary, as last time I got involved in command line, I had to re-install my Windows partition. I would like to non-destructively photo scan a number of my books, but the above is worrying me in spending time trying to resolve, before I invest time and money building a book scanner. Ta DB Link for scanned example I would like to clean. https://www.dropbox.com/s/87fpnx96lg...ement.pdf?dl=0 Last edited by Aesys; 10-02-2017 at 12:34 PM. Reason: Added DB Link for example to clean

10-02-2017, 01:10 PM	#2
orebmur Veteran Linux user Posts: 150 Karma: 1000000 Join Date: Mar 2017 Location: Barcelona/Spain Device: Boyue Likebook Note & Mimas, Hisense A5, hopefully soon a PineNote	If you never before heard of scantailor, it is about time. Great software, enabled me to successfully split and clean up some scanned PDF files. Check out github.com/scantailor/scantailor/wiki for documentation and some screenshots. Last edited by orebmur; 10-02-2017 at 01:12 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
chm to pdf, appears as scanned in some pdf softwares	syriaccj	Calibre	0	05-19-2013 02:51 PM
scanned pdf	excalibra	PDF	5	04-08-2011 04:41 AM
Any way to open a PDF in ABBYY 9.0 without actually processing the pages?	Ea	Workshop	3	03-07-2010 05:52 AM
processing scanned data into nice pdfs.	axel77	iRex	17	03-20-2008 08:33 PM

Advert