View Single Post
Old 02-14-2013, 03:59 AM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by shmendrapolk View Post
Does anyone have experience with scanning a book and then optimizing it as a pdf file?
Mind giving a sample of what you are working with? An entire page, or a piece of a page (a page chopped in half horizontally), so we could see what we are working with exactly.

Are these clean scans? Or are there lots of speckles, page edges, scanning artifacts.

Is this just for your own usage, or for others? (If only for your usage, cleaning up the PDF won't really matter if you are fine with the quality).

Quote:
Originally Posted by shmendrapolk View Post
The book was 51 (double column) pages. I find that selecting "compact pdf" results in a file that's not to large but fully readible.
Where is this "Compact PDF" selection being chosen (on the Xerox, or in Nitro)?

Quote:
Originally Posted by shmendrapolk View Post
I decided to run it through OCR software (Nitro 7) so I could have a document with searchable text.

.....

I used the default settings "searchable text image". I ended up with a 60mb file. And I don't understand why. Why was is it 30x larger than the original?
That is a problem with Nitro's output settings which are creating extremely bloated documents.

There are other OCR programs out there. Here is a list of them on Wikipedia:

https://en.wikipedia.org/wiki/Compar...ition_software

I personally use ABBYY Finereader.

Quote:
Originally Posted by shmendrapolk View Post
I then tried the alternative setting - "editable text". The resulting document looked the same except the few images and some artifacts were removed. But the file was still 7MB, considerably larger than the original.
Is your goal to have the original scan frontend, with a text backend?

Or are you just trying to output the OCRed text/images only?
Tex2002ans is offline   Reply With Quote