View Single Post
Old 11-06-2021, 01:17 PM   #50
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,819
Karma: 104541785
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by ownedbycats View Post
Also, what is up with Internet Archive's PDF compression? It rarely renders correctly on my Kobo, and instead I just get an image of text smudges. Even on my PC it's slow to render.
Quote:
Originally Posted by retiredbiker View Post
If you run pdfimages on one of these, you get out all sorts of crap. Black images, images of blurry smudges, images of real text, often inverted to white on black. Formats are mostly .ppm and .pbm. Without going into detail, I use ImageMagick, mostly, to end up with just the images I want. I read somewhere that the non-text images are masks, but I have no idea how a pdf uses them. I haven't seen this elsewhere, only from IA.
I suspect it has to do with the "Archive" part of the name. They are trying to preserve the books as well as they can. Volunteers supply scans. IA stores the data. Others, possibly in the far future, produce restored versions. The various files are meant as input to data processing rather than for viewing.

For example, I think the white text on black background images are intended to be used as masks to apply to full page color or grayscale scans to aid in making pure white backgrounds. They also supply them for pages with line drawings, which makes cleaning up such images much easier.
j.p.s is offline   Reply With Quote