View Single Post
Old 02-10-2009, 07:59 PM   #8
Hodapp87
Junior Member
Hodapp87 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2009
Device: Bebook
I would probably do something like...
1) Use pdfimages to extract all images.
2) Open a few images and figure out a color curve that pushes all text to black and most background to white. Save a gradient of this that imagemagick can grok. Figure out where to set some basic crop boxes. This assumes
3) Use imagemagick to crop and split the series of images all at once.
4) Use imagemagick to apply the color curve to all the images.
5) Do something useful with the series of converted images... DjVu, OCR, whatever.
Hodapp87 is offline   Reply With Quote