12-17-2011, 11:04 AM | #1 |
-----
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
|
Compressing large PDFs
As long as the page turning isn't slow I wasn't bothered by the large size of some scans I have. They're bundled into PDFs, OCRd, yet they weight more than 300M, I wonder if two can fit on a CD without doing some archiving.
Now, it happened for me to read in some other forum about tools that can compress a PDF. But it wasn't clear enough for me and later on the discussion descended into flames. I understand that Adobe Acrobat can do this kind of magic, but last time I have checked it's very expensive. Somebody suggested using ghostscript, yet the original poster proved going that way would make the file larger. Than somebody talked about Djvu, but I don't want to touch that subject. So in the end I know about as much as I knew before reading that thread, yet I feel a need to stop wasting hard drive space and make those books more portable as well. Can you help me, tutor me in this black art of PDF compressing? Free software if possible. |
12-17-2011, 12:01 PM | #2 |
Wizard
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
There is not and never will be a single method that always works for doing any alteration or conversion of all PDFs, period. Attempting any such action will always be a matter of trial-and-terror.
That said, I would suggest you start by getting Abode Reader (free) and a PDF virtual printer (aka "print to PDF" software) and give them a try. There are a number of such virtual printers available. I use PrimoPDF, which is available free, as my PDF virtual printer. The virtual printer software installs as a "printer" on your system allowing any application that can print to generate a PDF by simply printing to the virtual printer. Try opening your big PDFs in Reader and then printing to the PrimoPDF (or whichever you get) "printer". Experiment with the settings in PrimoPDF to see which yields the best compromise in file size and image quality. I've successfully used this method to resolve issues with some complex layered PDFs that contained scanned backgrounds layered behind scanned text. The results were somewhat smaller and performed much better in the reader. |
Advert | |
|
12-17-2011, 12:24 PM | #3 |
-----
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
|
Hmm, ghostscript does flaten and linearize without the issues of your method and keeping the bookmarks and metadata too. But the bold preaching was cute.
|
12-17-2011, 03:35 PM | #4 | |
Wizard
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
If you frequently have to either convert or alter PDFs you will want to collect a range of tools. When one doesn't work well simply try another. |
|
12-17-2011, 05:31 PM | #5 |
-----
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
|
Like...?
|
Advert | |
|
12-17-2011, 06:10 PM | #6 | |
Member
Posts: 16
Karma: 59042
Join Date: May 2010
Device: Kindle Voyage, PW1
|
Quote:
|
|
12-17-2011, 06:13 PM | #7 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
If you still have the scans a quick and dirty method would be to use Scan Tailor to straighten them up, align them (either by page number or header) and apply the "up to 99.8%" good enough OCR with ABBYY FineReader 11 -- the accuracy would probably lower with those black and white images, but whatever... At least they compress better than JPGs. The general ball park is somewhere between 10-35 MB, depending on the book - 256 bit cover and all. Definitely an improvement over 300 MB per book...
The quality method would be to proofread the OCR in FineReader, save it as .docx/.odt, do the layout in Word 2010/LibreOffice, track down fonts (which often times is a lot harder than it sounds), vectorize the cover (assuming you know how to use Illustrator/Inkscape) and proofread the final product again in case you may have missed something. This process is a lot more refined and it can output books between 1-3 MB, depending on the book. It can be a little bit time consuming, yes. But it's a pleasure to read such a book. Last edited by DSpider; 12-17-2011 at 06:25 PM. |
12-18-2011, 06:32 PM | #8 | ||
-----
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
|
Quote:
Quote:
Although I might highjack my own thread: what do you think of when you bring up that app? I'm not the Internet Archive having some photos of some ancient book. Anyway, I have a bunch of PDFs. They are large. Reducing the pixel size of each picture for the sake of small isn't going to fit the bill. A 700 page novel can be 2M large. A 700 page architecture book is 350M. My feeling is that can reach 30M without a serious downgrade. Also, I am asking for something, a method for example, that can shrink a PDF. I do not want to have my own bookshop, maybe learn how to do bookbinding just because a PDF is too large. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
optimizing large PDFs | sarah_pnix | 1 | 09-19-2011 02:32 PM | |
Kindle 3 Large Pdfs | asabz | Amazon Kindle | 15 | 09-13-2010 09:06 AM |
Reasons why PDFs get large? | martienne | 8 | 08-08-2010 10:11 PM | |
Best for large PDFs and dictionary | sheygetz | Which one should I buy? | 4 | 08-17-2008 07:49 PM |
reading large pdfs | kovidgoyal | Sony Reader | 6 | 07-12-2007 09:34 PM |