View Single Post
Old 09-24-2007, 05:11 PM   #189
ereszet
Zealot
ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.
 
ereszet's Avatar
 
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
Unpaper

Quote:
Originally Posted by cacapee View Post
Have you loooked into unpaper?
I was in touch with the author a few weeks ago to learn how to pipe jpg images to unpaper and save results as jpg as well. His advice was good for Linux but I have not managed to repeat it in DOS yet (however the DOS version packaged in pdfread works well with ppm/pbm/pgm).

Both versions - Linux and DOS - work nicely in batch mode. After experimenting with some parameters one can clean the image from black spots, lines and blobs with the result being masks (or blocks of text/image) surrounded by whitespace. His algorithm for conversion to black and white is based on the threshold method, which is not sufficient for poor quality originals. One has to keep in mind that even with cleaning parameters adjusted to clean one page, the processing may damage other pages by removing the text as well.

A nice feature of unpaper is splitting of double pages and replacing the dark shadow between the pages or at the margins with whitespace.

I asked the author to consider trimming the white space automatically once the program recognized the masks. He may do that in the future but not too soon.

For now, it is a good free preprocessing tool for pdflrf.
ereszet is offline