MobileRead Forums - View Single Post

Artha · 12-18-2011, 06:32 PM

Quote:

Originally Posted by garyyoung

What is the reason you don't want to try djvu? I've found it to be much more effective at reducing the size of scans than fiddling with pdf settings. There's free software to do the conversions and to view the files. The downside is that they can't be read on a kindle (at least not with the factory firmware) and probably most other dedicated e-readers out there.

Well, most readers don't have support for DjVu. Also most people have not heard of DjVu. Also, although the russian crowd would swear by DjVu it has the exact same compression algorithms as PDF even if the name is slighly different. Calibre and other indexing apps do not know how to handle this format and they end up treating it as some weird archive. There is no pagination, or at least I haven't seen any DjVu with pagination and I've seen bookmarks, but some readers do not show them. The free software for DjVu usually lacks documentation or the texts are all in russian with some self-explanatory pics. Care for some more reasons to reject some obscure format? Same goes for FB2. I like the concept, but EPUB is more widespread. And when EPUB is badly handled when everything is put up in clear text on some site in English, why split the developer community to waste resources on some other format? Also, surely you haven't noticed that, because it was in the subject line, but I ask about how to compress PDFs and not how to compress DjVus. Sure, even RAR or 7zip can bring improvements in filesize, but the result is not a PDF anymore. Right?

Quote:

Originally Posted by DSpider

If you still have the scans a quick and dirty method would be to use Scan Tailor to straighten them up, align them (either by page number or header) and apply the "up to 99.8%" good enough OCR with ABBYY FineReader 11 -- the accuracy would probably lower with those black and white images, but whatever... At least they compress better than JPGs. The general ball park is somewhere between 10-35 MB, depending on the book - 256 bit cover and all. Definitely an improvement over 300 MB per book...

The quality method would be to proofread the OCR in FineReader, save it as .docx/.odt, do the layout in Word 2010/LibreOffice, track down fonts (which often times is a lot harder than it sounds), vectorize the cover (assuming you know how to use Illustrator/Inkscape) and proofread the final product again in case you may have missed something. This process is a lot more refined and it can output books between 1-3 MB, depending on the book. It can be a little bit time consuming, yes. But it's a pleasure to read such a book.

You sound just like one of the preachers on the thread that pushed me to ask this. And the original poster had to fight with this too. Scantailor is for bad scans. This is more than just a scan. This is a PDF ebook I am talking about. The pages are not only up-right, but also they have been OCRd, paginated and bookmarked.

Although I might highjack my own thread: what do you think of when you bring up that app? I'm not the Internet Archive having some photos of some ancient book.

Anyway, I have a bunch of PDFs. They are large. Reducing the pixel size of each picture for the sake of small isn't going to fit the bill. A 700 page novel can be 2M large. A 700 page architecture book is 350M. My feeling is that can reach 30M without a serious downgrade.

Also, I am asking for something, a method for example, that can shrink a PDF. I do not want to have my own bookshop, maybe learn how to do bookbinding just because a PDF is too large.