Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-17-2011, 11:04 AM   #1
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Compressing large PDFs

As long as the page turning isn't slow I wasn't bothered by the large size of some scans I have. They're bundled into PDFs, OCRd, yet they weight more than 300M, I wonder if two can fit on a CD without doing some archiving.

Now, it happened for me to read in some other forum about tools that can compress a PDF. But it wasn't clear enough for me and later on the discussion descended into flames. I understand that Adobe Acrobat can do this kind of magic, but last time I have checked it's very expensive. Somebody suggested using ghostscript, yet the original poster proved going that way would make the file larger. Than somebody talked about Djvu, but I don't want to touch that subject. So in the end I know about as much as I knew before reading that thread, yet I feel a need to stop wasting hard drive space and make those books more portable as well.

Can you help me, tutor me in this black art of PDF compressing? Free software if possible.
Artha is offline   Reply With Quote
Old 12-17-2011, 12:01 PM   #2
dwig
Guru
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 979
Karma: 1382338
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Dell Venue 8 Pro, Kindle 3/WiFi - Retired:Clie UX50, T415, ...
There is not and never will be a single method that always works for doing any alteration or conversion of all PDFs, period. Attempting any such action will always be a matter of trial-and-terror.

That said, I would suggest you start by getting Abode Reader (free) and a PDF virtual printer (aka "print to PDF" software) and give them a try. There are a number of such virtual printers available. I use PrimoPDF, which is available free, as my PDF virtual printer.

The virtual printer software installs as a "printer" on your system allowing any application that can print to generate a PDF by simply printing to the virtual printer.

Try opening your big PDFs in Reader and then printing to the PrimoPDF (or whichever you get) "printer". Experiment with the settings in PrimoPDF to see which yields the best compromise in file size and image quality.

I've successfully used this method to resolve issues with some complex layered PDFs that contained scanned backgrounds layered behind scanned text. The results were somewhat smaller and performed much better in the reader.
dwig is online now   Reply With Quote
Old 12-17-2011, 12:24 PM   #3
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Hmm, ghostscript does flaten and linearize without the issues of your method and keeping the bookmarks and metadata too. But the bold preaching was cute.
Artha is offline   Reply With Quote
Old 12-17-2011, 03:35 PM   #4
dwig
Guru
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 979
Karma: 1382338
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Dell Venue 8 Pro, Kindle 3/WiFi - Retired:Clie UX50, T415, ...
Quote:
Originally Posted by Artha View Post
Hmm, ghostscript does flaten and linearize without the issues of your method and keeping the bookmarks and metadata too. But the bold preaching was cute.
My method has no particular issues except those common to any and all methods. That a method works for one particular PDF never means it will work for any other PDF. There is such a massively wide range of characteristics possible with the PDF format that any manipulation will always be a trial-and-terror situation. Hence, the preface to my first reply.

If you frequently have to either convert or alter PDFs you will want to collect a range of tools. When one doesn't work well simply try another.
dwig is online now   Reply With Quote
Old 12-17-2011, 05:31 PM   #5
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Like...?
Artha is offline   Reply With Quote
Old 12-17-2011, 06:10 PM   #6
garyyoung
Member
garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.
 
Posts: 14
Karma: 380
Join Date: May 2010
Device: Kindle Paperwhite
Quote:
Originally Posted by Artha View Post
As long as the page turning isn't slow I wasn't bothered by the large size of some scans I have. They're bundled into PDFs, OCRd, yet they weight more than 300M, I wonder if two can fit on a CD without doing some archiving.

Now, it happened for me to read in some other forum about tools that can compress a PDF. But it wasn't clear enough for me and later on the discussion descended into flames. I understand that Adobe Acrobat can do this kind of magic, but last time I have checked it's very expensive. Somebody suggested using ghostscript, yet the original poster proved going that way would make the file larger. Than somebody talked about Djvu, but I don't want to touch that subject. So in the end I know about as much as I knew before reading that thread, yet I feel a need to stop wasting hard drive space and make those books more portable as well.

Can you help me, tutor me in this black art of PDF compressing? Free software if possible.
What is the reason you don't want to try djvu? I've found it to be much more effective at reducing the size of scans than fiddling with pdf settings. There's free software to do the conversions and to view the files. The downside is that they can't be read on a kindle (at least not with the factory firmware) and probably most other dedicated e-readers out there.
garyyoung is offline   Reply With Quote
Old 12-17-2011, 06:13 PM   #7
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 414
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
If you still have the scans a quick and dirty method would be to use Scan Tailor to straighten them up, align them (either by page number or header) and apply the "up to 99.8%" good enough OCR with ABBYY FineReader 11 -- the accuracy would probably lower with those black and white images, but whatever... At least they compress better than JPGs. The general ball park is somewhere between 10-35 MB, depending on the book - 256 bit cover and all. Definitely an improvement over 300 MB per book...

The quality method would be to proofread the OCR in FineReader, save it as .docx/.odt, do the layout in Word 2010/LibreOffice, track down fonts (which often times is a lot harder than it sounds), vectorize the cover (assuming you know how to use Illustrator/Inkscape) and proofread the final product again in case you may have missed something. This process is a lot more refined and it can output books between 1-3 MB, depending on the book. It can be a little bit time consuming, yes. But it's a pleasure to read such a book.

Last edited by DSpider; 12-17-2011 at 06:25 PM.
DSpider is offline   Reply With Quote
Old 12-18-2011, 06:32 PM   #8
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Quote:
Originally Posted by garyyoung View Post
What is the reason you don't want to try djvu? I've found it to be much more effective at reducing the size of scans than fiddling with pdf settings. There's free software to do the conversions and to view the files. The downside is that they can't be read on a kindle (at least not with the factory firmware) and probably most other dedicated e-readers out there.
Well, most readers don't have support for DjVu. Also most people have not heard of DjVu. Also, although the russian crowd would swear by DjVu it has the exact same compression algorithms as PDF even if the name is slighly different. Calibre and other indexing apps do not know how to handle this format and they end up treating it as some weird archive. There is no pagination, or at least I haven't seen any DjVu with pagination and I've seen bookmarks, but some readers do not show them. The free software for DjVu usually lacks documentation or the texts are all in russian with some self-explanatory pics. Care for some more reasons to reject some obscure format? Same goes for FB2. I like the concept, but EPUB is more widespread. And when EPUB is badly handled when everything is put up in clear text on some site in English, why split the developer community to waste resources on some other format? Also, surely you haven't noticed that, because it was in the subject line, but I ask about how to compress PDFs and not how to compress DjVus. Sure, even RAR or 7zip can bring improvements in filesize, but the result is not a PDF anymore. Right?

Quote:
Originally Posted by DSpider View Post
If you still have the scans a quick and dirty method would be to use Scan Tailor to straighten them up, align them (either by page number or header) and apply the "up to 99.8%" good enough OCR with ABBYY FineReader 11 -- the accuracy would probably lower with those black and white images, but whatever... At least they compress better than JPGs. The general ball park is somewhere between 10-35 MB, depending on the book - 256 bit cover and all. Definitely an improvement over 300 MB per book...

The quality method would be to proofread the OCR in FineReader, save it as .docx/.odt, do the layout in Word 2010/LibreOffice, track down fonts (which often times is a lot harder than it sounds), vectorize the cover (assuming you know how to use Illustrator/Inkscape) and proofread the final product again in case you may have missed something. This process is a lot more refined and it can output books between 1-3 MB, depending on the book. It can be a little bit time consuming, yes. But it's a pleasure to read such a book.
You sound just like one of the preachers on the thread that pushed me to ask this. And the original poster had to fight with this too. Scantailor is for bad scans. This is more than just a scan. This is a PDF ebook I am talking about. The pages are not only up-right, but also they have been OCRd, paginated and bookmarked.

Although I might highjack my own thread: what do you think of when you bring up that app? I'm not the Internet Archive having some photos of some ancient book.

Anyway, I have a bunch of PDFs. They are large. Reducing the pixel size of each picture for the sake of small isn't going to fit the bill. A 700 page novel can be 2M large. A 700 page architecture book is 350M. My feeling is that can reach 30M without a serious downgrade.

Also, I am asking for something, a method for example, that can shrink a PDF. I do not want to have my own bookshop, maybe learn how to do bookbinding just because a PDF is too large.
Artha is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
optimizing large PDFs sarah_pnix PDF 1 09-19-2011 02:32 PM
Kindle 3 Large Pdfs asabz Amazon Kindle 15 09-13-2010 09:06 AM
Reasons why PDFs get large? martienne PDF 8 08-08-2010 10:11 PM
Best for large PDFs and dictionary sheygetz Which one should I buy? 4 08-17-2008 07:49 PM
reading large pdfs kovidgoyal Sony Reader 6 07-12-2007 09:34 PM


All times are GMT -4. The time now is 09:14 AM.


MobileRead.com is a privately owned, operated and funded community.