It's for my own usage. So I don't really care if it doesn't look pretty and if the scan is let's say 95-98% accurate. i could deal with the occasional typo and I won't throw away the Xerox in case there is something I need to check it against when reading.
Ultimately, my purpose here is convenience and time saving [You can skip this part as it's not about pdf optimization]:
They are academic (humanities) books I use in my research.
My usual process for using a book in research is:
-Read the book and make little annotations near the relevant parts
-Xerox only those pages that contain what I may need to quote and cite when writing
-Scan them in to the PC as jpegs (or as a pdf)
-Take notes in MS Word on the book including brief summaries about the specific passages I may need to cite and where to find them in the book.
-If I saved them as jpegs then each jpeg will bear the name of its page number.
I'm sure this sounds tedious to you, but trust me, when it came time to writing my dissertation (2006) having all my material scanned into the computer (and having two monitors) made life considerably easier. No stacks of papers spread out all over my floor; no serious time wasted transcribing hundreds of quotes, half of which I didn't end up using; and having all my material stored in a flash drive so I could write wherever I was.
Obviously reading ebooks (as pdfs) on my iPad eliminates many of these steps. And it is so with the books I am able to find as ebooks.
So it occurred to me to experiment by scanning one in in its entirety.
Some things to note:
-It's the xerox machine that sends it as a "compact pdf". It's one of the settings. What it does exactly I have no idea, but an otherwise 10-15mb file becomes less than 2mb if I select compact. i can see no difference in the results. And I had no trouble running the compact pdf through OCR.
So my goal here is to (1) save time and make things more convenient (2) not end up with massive files (3) without sacrificing (or rather risking) reliability.
Near as I can tell it's the embedded fonts on nitro that is adding the bloat - how else to explain 500kb instead of 7mb.
500kb sounds like a normal size for an ebook.
But can someone explain the differences between "searchable text image," and "editable text" and what is at stake between choosing one over the other? And whether removing embedded fonts matters or not?
Again this is all for myself. I'm not trying to created a pirated ebook to circulate. But I do need to be confident that it will look OK on multiple PCs and on future versions of Windows and iOS and what have you. I know a jpeg will never be an issue. But with these PDFs, I have no idea.
It's about saving me time without wasting too much hdd space.