View Single Post
Old 05-04-2008, 11:16 PM   #52
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Overall, pi version 0.3 works well, but I ran into some obstacles trying to 'windows-ize' it.

I succeeded in converting the sample .pdf using 'pi_format chap6.conf' on a Windows PC, but it was a brute-force finish that cannot be used in general. More testing/exploring is required to yield a windows only solution (in addition to the working linux based solution offered by the original poster).

In pi.py, I had to change the bold line to conform with pdftoppm.exe (from xpdf) output of the form "chap6-004-page-000004.pgm" i.e 6 digit page number prior to .pgm.
Code:
def get_img(self, dpi = 150, out_prefix = None):
        pdf_fn = self.doc.pdf_fn
        if out_prefix is None:
            out_prefix = '%spage' % (self.output_prefix,)
        spage = '%d' % (self.page_no,)
        sdpi = '%d' % (dpi,)
        ret = call(['pdftoppm', '-r', sdpi, '-f', spage, '-l', spage, '-gray',
                    pdf_fn, out_prefix])
        assert(ret == 0)
        img_fn = '%s-%06d.pgm' % (out_prefix, self.page_no)
        return img_fn
Also, pi.py was crashing when the bold line below was executed, hence the commenting out (but it leaves behind the .pgm since deleting doesn't work for some unknown reason).
Traceback (most recent call last):
File "pi_format.py", line 29, in <module>
File "pi_format.py", line 7, in test_all
File "pi.pyc", line 667, in __init__
File "pi.pyc", line 704, in get_avg_page_stat
File "pi.pyc", line 337, in __init__
File "pi.pyc", line 386, in parse
WindowsError: [Error 32] The process cannot access the file because it is being
used by another process: 'out/chap6-004-page-000004.pgm'
Code:
def parse(self, dpi = None):
        if dpi is None:
            dpi = self.dpi
        img_fn = self.get_img(dpi)
        p = Popen(['pi_page_parse', img_fn], stdout = PIPE)
        self.lines = []
        for l in p.stdout:
            ws = l.split()
            if  ws[0] == 'char':
                pair = map(int, ws[1:])
                ch = Char(pair)
                ln.append_char(ch)
            elif ws[0] == 'line':
                bbox = map(int, ws[1:])
                ln = Line(self, bbox)
                self.append_line(ln)
            else:
                self.bbox = map(int, ws[1:])
        self.img = Image.open(img_fn)
        #os.unlink(img_fn)
        self.set_space()
But then when I thought everything was working, I was getting random aborts due to PIL .pgm reading/writing problems as shown below in bold:
Code:
page: 4
Error: No display font for 'Symbol'
Error: No display font for 'ZapfDingbats'
Traceback (most recent call last):
  File "pi_format.py", line 29, in <module>
  File "pi_format.py", line 8, in test_all
  File "pi.pyc", line 722, in reformat
  File "pi.pyc", line 605, in divide
  File "pi.pyc", line 647, in put_seg
  File "pi.pyc", line 109, in get_img
  File "Image.pyc", line 737, in crop
  File "ImageFile.pyc", line 192, in load
IOError: image file is truncated (1111 bytes not processed)
The odd thing is the .pgm image files appear ok even though I get the 'truncated' message. The only way I got it to finish was to generate all the .pgm first, protect them from overwriting by marking them as 'read-only' and then allow 'pi_format chap6.conf' to finish.

In the end, I was able to collect all the generated .gifs and create a 1150 .imp ebook (and the first 17 pages only for Kindle/Cybook .prc and Sony .lrf ebooks). The results are far from perfect, but promising.
Attached Files
File Type: imp chap6-001-0.imp (2.42 MB, 503 views)
File Type: prc chap6-001-0-pages1-17.prc (883.1 KB, 2834 views)
File Type: lrf chap6-001-0-pages1-17.lrf (554.6 KB, 483 views)
File Type: zip gif-pages1-17.zip (1.21 MB, 500 views)

Last edited by nrapallo; 05-04-2008 at 11:23 PM. Reason: added resulting .gif for first 17 pages in ebook for viewing
nrapallo is offline   Reply With Quote