MobileRead Forums - View Single Post

nrapallo · 06-19-2009, 09:43 PM

Quote:

Originally Posted by projectxz2005

Cool. Thanks man! Alas I woke up this morning to a kindle 2 that doesn't turn on. Returning it tomorrow

Just wondering about the whitespace removal feature of pdfread...Could you tell me some more about it? i gather it's native to the script as opposed to imagemagick?

Not much to tell, as I didn't write the code myself. It basically is a cropping mechanism which stops at the (almost) first non-white "dot" vertically and horizontally. The "almost" part recognizes that the crop percentage can move that cropping line a bit.

The python script, process.py, has two functions that accomplish it, namely:

crop

Code:

""" perform image cropping via whitespace detection """
def crop(input, percent=DEFAULT_CROP_PERCENT):
  p('CROP ')
  w, h = input.size
  img  = ImageChops.invert(input)
  box  = img.getbbox()
  if box is None:
    return None

  l, t, r, b = box

  # crop horizontal blank areas
  temp = crop_axis(input, img, t, b, percent,
                   lambda s, e: (0, s, w, e),
                   lambda s   : (w, s),
                   lambda s   : (0, s))

  w, h = temp.size
  img  = ImageChops.invert(temp)

  # crop vertical blank areas
  return crop_axis(temp, img, l, r, percent,
                   lambda s, e: (s, 0, e, h),
                   lambda s   : (s, h),
                   lambda s   : (s, 0))

and
crop_axis

Code:

""" internal function for cropping a single axis """
def crop_axis(input, img, start, end, percent,
              func_crop, func_size, func_pos):

  # compute optimal step and size for given axis percentage
  size = min(MAX_CROP_SIZE,  max(1, int((end-start)*percent/100)))
  step = min(MAX_CROP_STEP,  max(1, int(size/10)))

  content    = []
  begin      = start
  blank_area = False
  for i in range(start, end, step):
    test = img.crop( func_crop(i, i+size) )
    if test.getbbox() is None:
      if not blank_area:
        # we've hit a blank area, so save content area
        content.append( (begin, i) )
        blank_area = True
    else:
      if blank_area:
        # we've moved out of a blank area, mark beginning
        begin = i
        blank_area = False

  # handle the last leftover area
  if not blank_area and end > begin:
    content.append( (begin, end) )

  # create image with shrunken axis
  newI   = sum([last-first for first, last in content])
  output = Image.new(input.mode, func_size(newI), None)

  # append the content parts
  i = 0
  for first, last in content:
    output.paste( input.crop(func_crop(first, last)), func_pos(i) )
    i += (last - first)

  # return output
  return output