Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-02-2023, 01:12 PM   #1
icq70610
Enthusiast
icq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enough
 
icq70610's Avatar
 
Posts: 49
Karma: 510
Join Date: Sep 2008
Device: PSR505
Question ghostscript ccitt with pdfwrite

Hi all

another -- kind of specific question ...

im currently (ok 3 - 4 years) digitizing my whole library from childhood and i have become kind of obsessed with scanning procedures.

my current workflow is like this

Scan --> scantailor --> lots of clicking --> tiff --> mogrify stuff --> ps --> pdf

the results are really good and normally Im quite fond of them. filesize quality 300x300 i ould give them a 8-9 of 10. My "trouble" starts with books that are from like minded people that doing similar efforts and a example doc may look like this.

-----
Creator: PDF-XChange Editor 5.5.xxx
Producer: PDF-XChange PDF Core API (5.5.xxx)
CreationDate: xxx xxx
ModDate: xxx xxx
Custom Metadata: no
Metadata Stream: yes
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 164
Encrypted: no
Page size: 372 x 559.68 pts
Page rot: 0
File size: 8100115 bytes
Optimized: no
PDF version: 1.2
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1550 2332 rgb 3 8 jpeg no 170 0 300 300 371K 3.5%
2 1 image 1550 2332 index 1 1 ccitt no 172 0 300 300 17B 0.0%
3 2 image 1550 2332 index 1 1 ccitt no 174 0 300 300 9123B 2.0%
4 3 image 1550 2332 index 1 1 ccitt no 176 0 300 300 7332B 1.6%
5 4 image 1550 2332 index 1 1 ccitt no 178 0 300 300 36.8K 8.3%
6 5 image 1550 2332 index 1 1 ccitt no 180 0 300 300 42.6K 9.7%

----

as you can see 371K for the cover 300x300 dpi and around 42K for a "traditional" grey image. Cover is a jpg rbg and the rest is ccitt (tiff4) encoded.

Anybody has an Idea to instruct ghostscript commandline to achieve similar encodings?

when i encode them it looks like this

page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1550 2332 icc 3 8 jpeg no 9 0 300 300 389K 3.7%
2 1 image 1550 2332 index 1 1 image no 16 0 300 300 461B 0.1%
3 2 image 1550 2332 index 1 1 image no 22 0 300 300 19.7K 4.5%
4 3 image 1550 2332 index 1 1 image no 28 0 300 300 14.4K 3.3%
5 4 image 1550 2332 index 1 1 image no 34 0 300 300 64.4K 15%
6 5 image 1550 2332 index 1 1 image no 40 0 300 300 74.3K 17%
7 6 image 1550 2332 index 1 1 image no 46 0 300 300 76.3K 17%
8 7 image 1550 2332 index 1 1 image no 52 0 300 300 76.5K 17%
9 8 image 1550 2332 index 1 1 image no 58 0 300 300 75.7K 17%


as you can see on ghostscript im not reaching ccitt encoding? anyone knows the correct parameter for gs ... even a single page to encode in ccitt with pdfwrite as device?

\Pete
icq70610 is offline   Reply With Quote
Old 01-02-2023, 06:06 PM   #2
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,986
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box.

When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file.

Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox.
rkomar is offline   Reply With Quote
Advert
Old 01-03-2023, 02:00 PM   #3
icq70610
Enthusiast
icq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enoughicq70610 will become famous soon enough
 
icq70610's Avatar
 
Posts: 49
Karma: 510
Join Date: Sep 2008
Device: PSR505
Quote:
Originally Posted by rkomar View Post
TIFF format is more about the wrapping than the encoding. You can compress using many different methods within a TIFF wrapper. I would suggest that you produce your black-and-white images as CCIT4 encoded TIFF files before calling gs to create the PDF file (i.e. use the option "-compress Group4" when running mogrify/convert). I'm not familiar with scantailor, but maybe it offers that option out of the box.

When I was first scanning my books, I would use convert to produce the TIFF files with CCIT4 compression. Then I would use tiffcp to combine the separate TIFF files into a single multi-page TIFF file. I would then use either tumble or tiff2pdf to convert the multi-page TIFF file into a PDF file. Then I would use gs as the last step to add PDFMARKS to the PDF file.

Nowadays I used pdfbeads, but that has become more complicated than my old way because the program is no longer maintained and is very difficult to get working on a modern system. I use my old copy of pdfbeads within an old linux distro running inside VirtualBox.
Thank you for the quick answer - pdfbeads -- interesting idea -- (vm i get it :-) ) - as for the other points above -- thats exactly what im currently doing and i wrote a crude bash wrapper for tryouts

but basically its the following.

qpdf --> explode all pdfs into single pdf pages
gs --> convert pdf to tiff (b/w tiffg4)
gs -q -dBATCH -dNOPAUSE -sDEVICE=tiffg4 -r300x300 -dFirstPage=1 -dLastPage=1 -sOutputFile=111.tif page-111.pdf
than loop over the tif -> pdf with img2pdf a "raw" wrapper without encoding https://gitlab.mister-muffin.de/josch/img2pdf
and than bulk all the PDF's together into a combined pdf.

rather crude - but i achieve good compression results on b/w images with minimal effort and quite reasonable quality. (please be aware that the input images should be b/w allready -- if they are grey the tiffg4 encode gives sometimes funky results.

though i share -- topic closed on my end -- but it was hell of frustrating :-) to get some grip on that.

\Pete
icq70610 is offline   Reply With Quote
Old 01-03-2023, 06:07 PM   #4
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,161
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
ImageMagick, or the GIMP (import as layers)
Quoth is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Cropping PDFs for EPUB conversion using BRISS, Ghostscript and/or Calibre fredthefork PDF 2 08-09-2019 01:04 PM
Pdf compression options in Ghostscript? MarjaE PDF 1 06-15-2019 01:44 PM
ghostscript? MartinZ PocketBook Developer's Corner 7 04-04-2012 08:13 PM
603 PDF with CCITT images adam l PocketBook 10 09-01-2011 03:50 AM


All times are GMT -4. The time now is 11:35 AM.


MobileRead.com is a privately owned, operated and funded community.