Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 08-04-2016, 05:17 PM   #1
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Minimising the size of a page-scan PDF

My university library, to which I have access as an alumnus, has a superb collection of the classic works of Egyptology, many of which were published in the 19th and early 20th centuries and are long out of copyright. I have, therefore, been gradually borrowing and copying them for my personal collection. Because Egyptology books tend to have things like hieroglyphs in them, as well as lots of drawings and other illustrations, I'm creating page-scanned PDFs to read on my iPad. My process is as follows:

1. Put the book on the floor in good light.
2. Photograph each page with my DSLR.
3. Process the raw images in Adobe Lightroom to boost contrast, trim margins, etc.
4. Export all the page images as JPEGs.
5. Zip all the images up and rename the ZIP file to have a ".CBZ" extension.
6. Import the CBZ file into Calibre.
7. Do a conversion to PDF in Calibre.

With this method I can create a beautiful page-scan PDF of a 200-page book in about 3h which looks superb on my iPad.

But...

... It's huge!

A high quality JPEG image of a single page is typically around 600kb, so a 200 page book ends up as a PDF file about 120MB in size (the size of the PDF is basically just the sum of the size of all the page images). This is a really, really nice PDF that I can zoom in on quite a lot on my iPad (handy for images) with it still looking good.

A 200MB PDF isn't particularly a problem on my 128GB iPad, but I notice that most equivalent page-scan PDFs I download from "archive.org" are only 10-20MB in size - ie 5-10% of the size of mine. They don't look quite as good as mine, but they're pretty good!

Does anyone know how they do this? If I reduce the size and/or quality of my page images from 600kb to 60kb the result looks appalling. How could I get PDFs a 10th the size of the ones I'm currently creating which still look reasonable?

Any advice would be gratefully received!

I think I actually may know a part of the reason myself. Because I'm photographing the pages with an excellent camera, rather than scanning them, my page images are superb quality. If I zoom in I can see the individual fibres in the surface of the paper. All that "information" in the image is producing big JPEGs. The "archive.org" images just have a "flat" paper background which presumably results in small images, because the text is pretty much the only information on the page. Anyone know how I can remove that fine detail from my page images without making the words blurry too?

Last edited by HarryT; 08-04-2016 at 05:20 PM.
HarryT is offline   Reply With Quote
Old 08-06-2016, 08:15 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
How do you reduce the size and quality of the images? I'd try first going to grayscale (unless the page has colours), and then adjusting the levels (not just the contrast) to have a white background with no texture and black text. Then reduce the pixel size to the minimum you'll be satisfied with, and save as JPG with the highest compression level you are satisfied with. Reducing the number of grey levels to something like 16 before saving could also help.

And do you need to convert it PDF? Can't you just read the CBZ (CBR or CB7 could be somewhat smaller)?

I you post a couple of sample pages I could have a go and give you some specific settings...
Jellby is offline   Reply With Quote
Old 08-06-2016, 08:28 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Hi Jellby,

The reason I prefer PDF is that it's much more portable than CBZ.

Please find attached a couple of sample pages. These are at the minimum resolution I consider to be readable. I need to have the hieroglyphs sharp and clear. Any suggestions for significantly reducing the size of the page image would be very gratefully received.
Attached Thumbnails
Click image for larger version

Name:	015 -_.jpg
Views:	425
Size:	615.2 KB
ID:	150707   Click image for larger version

Name:	225 -_.jpg
Views:	410
Size:	640.6 KB
ID:	150708  
HarryT is offline   Reply With Quote
Old 08-06-2016, 09:40 AM   #4
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Let's see if this helps. I did this with gimp, which may behave different from photoshop.

015: Changed to greyscale mode. Adjusted levels: input levels from 70 (black) to 230 (white). Saved as JPG with 85% quality (my default). Result: 015 -_(2).jpg, 40% size. Saved as JPG with 30% quality. Result: 015 -_(3).jpg, 19% size. Changed to indexed color mode, 16 maximum levels. Saved as PNG, maximum compression. Result: 015 -_(4).png, 18% size.

225: Same process, but adjusted the levels between 85 and 215. 225 -_(2).jpg: 38%. 225 -_(3).jpg: 19%. 225 -_(4).png: 16%.

I think the 30% quality JPGs are too aggressive and blurry, but the PNGs look quite acceptable.
Attached Thumbnails
Click image for larger version

Name:	levels.png
Views:	436
Size:	4.5 KB
ID:	150712   Click image for larger version

Name:	015 -_(2).jpg
Views:	431
Size:	246.8 KB
ID:	150713   Click image for larger version

Name:	015 -_(3).jpg
Views:	405
Size:	119.4 KB
ID:	150714   Click image for larger version

Name:	015 -_(4).png
Views:	412
Size:	112.5 KB
ID:	150715   Click image for larger version

Name:	225 -_(2).jpg
Views:	422
Size:	245.6 KB
ID:	150716   Click image for larger version

Name:	225 -_(3).jpg
Views:	409
Size:	119.8 KB
ID:	150717   Click image for larger version

Name:	225 -_(4).png
Views:	430
Size:	104.1 KB
ID:	150718  
Jellby is offline   Reply With Quote
Old 08-06-2016, 09:46 AM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Thank you very much indeed for your excellent suggestions. I'll study the results and see what I think is acceptable. Really appreciate the help - thanks!
HarryT is offline   Reply With Quote
Old 08-06-2016, 10:27 AM   #6
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
I think Jellby is on the right track. ZIP compression will do very little if any actual compression on a JPEG. The trick will be to reduce the size of the JPEGs before creating the CBZ/ZIP.

If you should choose to use Photoshop, you should NEVER EVER use the Ps option to "Save as..." to create JPEGs for any CBZ, eBook, or web use. You should only use its "Save for Web and Devices..." option. "Save as..." embeds a whole plethora of Ps specific ancillary data (guides, ...) in the JPEG thus bloating the size.

"Save for Web and Devices..." will not do this and offers additional options to strip even more metadata. As a result it will produce substantially smaller JPEGs when using the same quality settings.
dwig is offline   Reply With Quote
Old 08-06-2016, 10:48 AM   #7
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Thanks, dwig. I use Adobe Lightroom, not Photoshop. LR doesn't add the junk that PS does.
HarryT is offline   Reply With Quote
Old 08-08-2016, 09:35 PM   #8
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,986
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I wrote myself a program for whitening the background of grayscale images. Given some threshold, if a pixel is lighter than the threshold, and all of the eight surrounding pixels are lighter than the threshold, the central pixel is set to be white. It works surprisingly well for such a simple idea. The solid white background will then compress much better than a noisy gray background. It also greatly improves the contrast on an e-ink device.
rkomar is offline   Reply With Quote
Old 08-09-2016, 09:01 AM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by HarryT View Post
Any advice would be gratefully received!
I have noticed that highly compressed PDFs I've looked at over the years tend to (1) use 1-bit color layers and/or (2) use JPEG-2000 / JPX compression streams. For instance, if I scan a black and white document that I've marked with a red pen on the copiers where I work, at high compression settings, the copier scanning algorithm creates two layers: a black/white 1-bit layer for the black and white text plus a separate red/transparent 1-bit layer to overlay my red markups. It seems like a pretty sophisticated algorithm. I'm sure if you enhance your contrast ratio via some of the suggestions already made, you should be able to compress to fewer shades of gray--maybe even 1 bit (just black and white), but I'm not sure you quite have the resolution for that.
willus is offline   Reply With Quote
Old 08-09-2016, 09:11 AM   #10
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by willus View Post
I'm sure if you enhance your contrast ratio via some of the suggestions already made, you should be able to compress to fewer shades of gray--maybe even 1 bit (just black and white), but I'm not sure you quite have the resolution for that.
The original is a camera RAW file with a resolution of 6000x4000 pixels, and a 14-bit pixel depth (ie 16000 intensity levels). See the attached full-resolution sample: you can clearly see the fibres of the paper surface.
Attached Thumbnails
Click image for larger version

Name:	sample.jpg
Views:	376
Size:	28.7 KB
ID:	150772  
HarryT is offline   Reply With Quote
Old 08-09-2016, 09:29 AM   #11
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by HarryT View Post
Please find attached a couple of sample pages. These are at the minimum resolution I consider to be readable. I need to have the hieroglyphs sharp and clear. Any suggestions for significantly reducing the size of the page image would be very gratefully received.
I would aim at adjusting the level to make the background pure white. You should be able to do this by adjusting curves so that the text itself doesn't become too washed out.

Reducing to greyscale may make a JPEG a bit smaller, but if the page is all essentially text (not greyscale images), reducing the 4-bit greyscale and saving as PNG might work even better.

Could you attach full-size page images, not already reduced to a smaller pixel size? I'd be interested in having a play this evening.
pdurrant is offline   Reply With Quote
Old 08-15-2016, 03:12 PM   #12
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Hi HarryT

Maybe using another tool can help here.

Using the 2 pages from your post I created a pdf using the method you describe – created a cbz and converted to pdf with calibre. The result pdf was 560 Kbytes in size.

Using Adobe Acrobat Pro 11 and applying ocr (French) using the “clear scan” option - from wikipedia “Adobe ClearScan technology creates and embeds custom Type1-CID fonts to match the visual appearance of a scanned document after optical character recognition. ClearScan uses these newly created custom fonts instead of system fonts or Type1-MM” -, the result was a pdf 256 Kbytes in size.

Similar result was obtained using Finereader Pro 11 and saving image pdf only - the file was 266 Kbytes in size.
Saving using the same option but activating the setting “"use mixed raster content" gives an even smaller file size of 108 Kbytes, but I do not advise the use of this option as the result is not good.

The final size of a pdf has lots to do with the complexity, color and other details o the original pages, so to get you an idea I would have to have access to, at least, 30 to 50 pages, or if possible all the jpg's of a full book.

Notes:
1 - other more professional (and much more expensive) pieces of software can even get you smaller file sizes;
2 - all 3 example files created are attached.

Best regards,
Attached Files
File Type: pdf File via Acrobat.pdf (255.5 KB, 320 views)
File Type: pdf File via Finereader.pdf (265.9 KB, 388 views)
File Type: pdf File via Finereader with use mixed raster content.pdf (108.0 KB, 348 views)

Last edited by DDHarriman; 08-16-2016 at 02:53 PM.
DDHarriman is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Epub to PDF for printing - page size Westcork Conversion 10 09-21-2016 05:34 AM
e-pub to pdf:- page size conundrum MrB Conversion 2 10-03-2012 09:42 AM
PRS-950 PDF Page Size jessie102 Sony Reader 6 12-16-2010 02:15 PM
PDF output - page size/orientation problems kurokaze Calibre 1 09-26-2010 06:08 PM
PDF page size DuckDodgers iRex 2 08-09-2006 02:17 PM


All times are GMT -4. The time now is 04:56 PM.


MobileRead.com is a privately owned, operated and funded community.