View Full Version : Reasons why PDFs get large?


martienne
08-08-2010, 04:35 AM
My PDFs that I generate end up surprisingly large. For example, a grammar book is now 40 MB.

I was selecting 600dpi, black-white colour scale and Adobes "screen" format which seems to be the smallest size you can get.

What am I doing wrong?

I want good quality on my Irex DR800 but 40 MB is the size of a 30 minutes film! Surely that's too large for an eBook??!

HarryT
08-08-2010, 04:48 AM
It sounds as though your book contains images of pages, rather than text.

Adoby
08-08-2010, 04:53 AM
I assume that you are saving the pages as high resolution graphics, not text. That is why the book gets so large. Save the text as text instead, much more efficient.

If you are scanning, try using the same resolution as the one you are going to use when reading the page. 600 dpi is a very high resolution for text.

joblack
08-08-2010, 06:24 AM
You don't have to rescan the book. Use the PDF as an input for the OCR software.

martienne
08-08-2010, 07:24 AM
Thanks for the advice so far.

I should explain that I don't have the original book. I got the pdf from a language learning site. It's already been scanned by someone else (please don't worry about copyright; the book was printed in the USSR which did not enforce copyright and only charged symbolic prices for books).

The scan that I have, has massive white borders top, bottom, left and right. My changes was an attempt to optimize the book for the Irex DR800.

1) So if 600DPI is too high a resolution, then what resolution would you recommend?

2) Yes you are right that the book is scanned as images, not as text. Can I change it to text, even though the original was scanned as images? How?

HarryT
08-08-2010, 07:31 AM
The DR800 has a screen resolution of about 160dpi, so anything higher than that isn't gaining you anything.

The only way to convert to text is to use OCR software.

martienne
08-08-2010, 08:10 AM
Thanks Harry! OK I am trying it with Abbyy FineReader. That is one amazing piece of software. The text is in a mix of Russian and Swedish and it hasn't missed a letter so far.... Maybe I might even succeed in turning this book into an ePub... Very cool.

Of course, as you know, any type of educational material has a deliberate layout which can be hard to retain through conversions. Columns, bold text, info boxes etc.

I think that for a normal novel (read and chuck away) I'd be done a long time ago. But this book is more like a reference. On the other hand, if I can get this right then I can fix up any book!

HarryT
08-08-2010, 08:34 AM
Even without the OCR, if you were to reduce the image resolution from 600dpi to 200dpi, you'd make the file 9x smaller.

Lady Fitzgerald
08-08-2010, 10:11 PM
If you have Adobe Acrobat (not the Reader), try loading a copy of the book (never mess with your only copy), click on Document at the top of the screen, click on reduced file size then follow the prompts. That will reduce the size of your file but sometimes it reduces the quality.