MobileRead Forums - View Single Post - An algorithm to render PDF in small devices

nrapallo · 05-04-2008, 11:16 PM

Overall, pi version 0.3 works well, but I ran into some obstacles trying to 'windows-ize' it.

I succeeded in converting the sample .pdf using 'pi_format chap6.conf' on a Windows PC, but it was a brute-force finish that cannot be used in general. More testing/exploring is required to yield a windows only solution (in addition to the working linux based solution offered by the original poster).

In pi.py, I had to change the bold line to conform with pdftoppm.exe (from xpdf) output of the form "chap6-004-page-000004.pgm" i.e 6 digit page number prior to .pgm.

Code:

def get_img(self, dpi = 150, out_prefix = None):
        pdf_fn = self.doc.pdf_fn
        if out_prefix is None:
            out_prefix = '%spage' % (self.output_prefix,)
        spage = '%d' % (self.page_no,)
        sdpi = '%d' % (dpi,)
        ret = call(['pdftoppm', '-r', sdpi, '-f', spage, '-l', spage, '-gray',
                    pdf_fn, out_prefix])
        assert(ret == 0)
        img_fn = '%s-%06d.pgm' % (out_prefix, self.page_no)
        return img_fn

Also, pi.py was crashing when the bold line below was executed, hence the commenting out (but it leaves behind the .pgm since deleting doesn't work for some unknown reason).

Traceback (most recent call last):
File "pi_format.py", line 29, in <module>
File "pi_format.py", line 7, in test_all
File "pi.pyc", line 667, in __init__
File "pi.pyc", line 704, in get_avg_page_stat
File "pi.pyc", line 337, in __init__
File "pi.pyc", line 386, in parse
WindowsError: [Error 32] The process cannot access the file because it is being
used by another process: 'out/chap6-004-page-000004.pgm'

Code:

def parse(self, dpi = None):
        if dpi is None:
            dpi = self.dpi
        img_fn = self.get_img(dpi)
        p = Popen(['pi_page_parse', img_fn], stdout = PIPE)
        self.lines = []
        for l in p.stdout:
            ws = l.split()
            if  ws[0] == 'char':
                pair = map(int, ws[1:])
                ch = Char(pair)
                ln.append_char(ch)
            elif ws[0] == 'line':
                bbox = map(int, ws[1:])
                ln = Line(self, bbox)
                self.append_line(ln)
            else:
                self.bbox = map(int, ws[1:])
        self.img = Image.open(img_fn)
        #os.unlink(img_fn)
        self.set_space()

But then when I thought everything was working, I was getting random aborts due to PIL .pgm reading/writing problems as shown below in bold:

Code:

page: 4
Error: No display font for 'Symbol'
Error: No display font for 'ZapfDingbats'
Traceback (most recent call last):
  File "pi_format.py", line 29, in <module>
  File "pi_format.py", line 8, in test_all
  File "pi.pyc", line 722, in reformat
  File "pi.pyc", line 605, in divide
  File "pi.pyc", line 647, in put_seg
  File "pi.pyc", line 109, in get_img
  File "Image.pyc", line 737, in crop
  File "ImageFile.pyc", line 192, in load
IOError: image file is truncated (1111 bytes not processed)

The odd thing is the .pgm image files appear ok even though I get the 'truncated' message. The only way I got it to finish was to generate all the .pgm first, protect them from overwriting by marking them as 'read-only' and then allow 'pi_format chap6.conf' to finish.

In the end, I was able to collect all the generated .gifs and create a 1150 .imp ebook (and the first 17 pages only for Kindle/Cybook .prc and Sony .lrf ebooks). The results are far from perfect, but promising.