An algorithm to render PDF in small devices - Page 6

Hanselda · 08-06-2008, 05:59 AM

That is really excellent work. In fact I made something very similar. But I did not go as far as to analyze the image!

Some ideas:
1. Try to use the command pdfimage from pdflib, this can compile all the png images directly into a single PDF. It is much faster than using convert again.

2. Try to quantize the color of the png file. This will reduce the image file size significantly. For e-ink screen the color depth is only 4 - 16, compared to standard 8 bit channel with 256 colors.

3. This method in fact can also work for djvu file. With ddjvu command one can convert certain page into pgm:
'ddjvu -page=%i -scale=%i -format=pgm %s %s' %(pageno, dpi, inputfile, outputfile)

kentsin · 08-10-2008, 03:36 AM

I use another pdf and got

page: 1
page: 2
Traceback (most recent call last):
File "/home/kentsin/pi/bin/pi_format.py", line 59, in <module>
test_all(sys.argv[1])
File "/home/kentsin/pi/bin/pi_format.py", line 16, in test_all
doc.reformat()
File "/home/kentsin/pi/bin/pi.py", line 1154, in reformat
page = Page(self, pn)
File "/home/kentsin/pi/bin/pi.py", line 690, in __init__
BasicPage.__init__(self, doc, page_no, dpi)
File "/home/kentsin/pi/bin/pi.py", line 569, in __init__
self.dpi = self.get_dpi()
File "/home/kentsin/pi/bin/pi.py", line 695, in get_dpi
dpi = self.doc.target_width * 50 / width
ZeroDivisionError: float division

caritas · 08-11-2008, 08:59 AM

Version 0.6 is released. Binary and source can be downloaded from the first post of thread.

ChangeLog:

2008-08-11 Huang Ying <ying.huang.caritas@gmail.com>

* Version: 0.6

* pi.py: Initial implementation of embolden.

* pi.py: Use norm coordinate in class Page and Line.

* pi.py: Add edge trimming support.

* pi.py: Add run pages mode.

* pi.py: Add page range support.

* pi.py: Re-work ImageOutput, split multi-page image.

* pi.py: Rotate during scale if approriate.

* img_dir_to_pdf.sh: Add color reduction support.

Gianfranco · 08-11-2008, 06:03 PM

I used v0.5 to merge all files into a pdf, but the result was negated. The text was white and the page was black, what could have caused this?

Am I the only one who has experienced it?

Best regards
Gianfranco Alongi

PS: Great tool

!

hansl · 08-12-2008, 04:14 AM

Quote:

Originally Posted by Gianfranco

I used v0.5 to merge all files into a pdf, but the result was negated. The text was white and the page was black, what could have caused this?

Am I the only one who has experienced it?

Best regards
Gianfranco Alongi

PS: Great tool

!

I had the same problem and it went away with this fix:

in img_dir_to_pdf.sh line 27 change
tiff2pdf -z -o $cwd/$pdf_fn pdf-$pdf_fn.tiff
to
tiff2pdf -n -z -o $cwd/$pdf_fn pdf-$pdf_fn.tiff

I have not tried but in v0.6 caritas changed that to
tiff2pdf -nz ... so I guess it will work with 0.6 natively

hansl

Gianfranco · 08-12-2008, 08:27 AM

Okay. Nice.
I'll try v 0.6 directly once I come home from work

And once again;;; what a great tool

Maybe you should consider releasing a howto and tutorial on the tool caritas?

Gianfranco · 08-12-2008, 05:23 PM

I used the new release and I am pleased

I wrote about this a little in my blog

xiblack · 08-20-2008, 01:45 AM

Quote:

Originally Posted by nrapallo

Overall, pi version 0.3 works well, but I ran into some obstacles trying to 'windows-ize' it.

I succeeded in converting the sample .pdf using 'pi_format chap6.conf' on a Windows PC, but it was a brute-force finish that cannot be used in general. More testing/exploring is required to yield a windows only solution (in addition to the working linux based solution offered by the original poster).

In pi.py, I had to change the bold line to conform with pdftoppm.exe (from xpdf) output of the form "chap6-004-page-000004.pgm" i.e 6 digit page number prior to .pgm.

Code:

def get_img(self, dpi = 150, out_prefix = None):
        pdf_fn = self.doc.pdf_fn
        if out_prefix is None:
            out_prefix = '%spage' % (self.output_prefix,)
        spage = '%d' % (self.page_no,)
        sdpi = '%d' % (dpi,)
        ret = call(['pdftoppm', '-r', sdpi, '-f', spage, '-l', spage, '-gray',
                    pdf_fn, out_prefix])
        assert(ret == 0)
        img_fn = '%s-%06d.pgm' % (out_prefix, self.page_no)
        return img_fn

Hi,

I try the latest pi_06 on my SuSE OSS 10.0.0, it didnt work until I try the fix above.

After the fix, pi_06 works well but I encounter this error after some pages generated:

Quote:

...
page: 30
page: 31
page: 32
Traceback (most recent call last):
File "/home/name/download/pi/bin/pi_format.py", line 67, in ?
test_all(sys.argv[1])
File "/home/name/download/pi/bin/pi_format.py", line 16, in test_all
doc.reformat()
File "/home/name/download/pi/bin/pi.py", line 1495, in reformat
page.rend()
File "/home/name/download/pi/bin/pi.py", line 761, in rend
self.img = self.img.filter(ImageFilter.MinFilter(3))
File "/usr/lib/python2.4/site-packages/PIL/Image.py", line 715, in filter
self.load()
File "/usr/lib/python2.4/site-packages/PIL/ImageFile.py", line 148, in load
self.im = Image.core.map_buffer(
ValueError: buffer is not large enough

I wonder where I can set the buffer larger or is it a limit of anything?

ashkulz · 08-25-2008, 10:09 AM

I've attached a working version of pi-0.6 which will work under Windows. I had to make a few changes in the code, which have been attached as a diff. Probably caritas could apply them in the next release (they're generic).

Usage: Unzip pi-0.6-win32.zip somewhere and run as instructed above by caritas (You'll need a working Python with PIL installation). In case you want the proper fonts, unzip xpdf-fonts.zip in the same directory and adjust the paths in bin/xpdfrc (right now it's hardcoded to C:\pi).

Enjoy!

nrapallo · 08-25-2008, 10:46 AM

Quote:

Originally Posted by ashkulz

I've attached a working version of pi-0.6 which will work under Windows. I had to make a few changes in the code, which have been attached as a diff. Probably caritas could apply them in the next release (they're generic).

Usage: Unzip pi-0.6-win32.zip somewhere and run as instructed above by caritas (You'll need a working Python with PIL installation). In case you want the proper fonts, unzip xpdf-fonts.zip in the same directory and adjust the paths in bin/xpdfrc (right now it's hardcoded to C:\pi).

Enjoy!

Well done Ashish!

Now I FINALLY can get to try this out (in WinXP) and perhaps incorporate it into PDFRead. Or do you want to do that as I'm at a disadvantage not knowing python as well as you (and it is your original creation)?

Thank you for doing this; I had given up trying to get my windows implementation to work.

BTW, I got a proxy server working for the REB1200 if you are interested. It's in the Fictionwise forum and called Linreb.

Regards,

caritas · 08-30-2008, 03:33 AM

Quote:

Originally Posted by ashkulz

I've attached a working version of pi-0.6 which will work under Windows. I had to make a few changes in the code, which have been attached as a diff. Probably caritas could apply them in the next release (they're generic).

Thank you very much!

I will add it to the next version.

caritas · 08-30-2008, 04:50 AM

Version 0.7 is released, ChangeLog is as follow:

2008-08-30 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.7

* pi.py: Add LRF output support.

* pi.py: Add TOC support for LRF output format

* pi.py: Add output rotate support.

* pdfminfo: Add pdfminfo to extract PDF information such as TOC,
title, author, etc.

* overall: Add initial windows support, thanks ashkulz of
mobileread forum.

ashkulz · 08-30-2008, 07:44 PM

Quote:

Originally Posted by caritas

Version 0.7 is released, ChangeLog is as follow:

2008-08-30 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.7

* pi.py: Add LRF output support.

* pi.py: Add TOC support for LRF output format

* pi.py: Add output rotate support.

* pdfminfo: Add pdfminfo to extract PDF information such as TOC,
title, author, etc.

* overall: Add initial windows support, thanks ashkulz of
mobileread forum.

I'm attaching pi_page_parse 0.7 compiled for windows. The usage should be similiar to the 0.6 version (if you want, install 0.6 first and then replace all *.py in the bin folder from the 0.7 version).

nrapallo · 09-10-2008, 12:21 PM

I tried ashkulz's win32 executable (v0.6) and obtained great results trying to convert the sample chap6.pdf into my readers native .imp format using PDFRead on the resulting .png in the out folder.

The optimal PDFRead settings used were:

1. In Format 'imgdir'
2. Out Format 'imp2' for EBW1150 or 'imp1' for REB1200. Just substitue your reader's format here instead.
3. Use a 'portrait-p' profile and 'portrait-full' layout mode
4. Check the 'no dilation' box (I tried dilation and since the pi .png's are in a lower resolution it looks terrible!)
5. Click 'Convert'

Looks promising, now only to get that pi algorithm incorporated into PDFRead (with GUI)!

nrapallo · 09-10-2008, 05:12 PM

Quote:

Originally Posted by nrapallo

I tried ashkulz's win32 executable (v0.6) and obtained great results trying to convert the sample chap6.pdf into my readers native .imp format using PDFRead on the resulting .png in the out folder.

The optimal PDFRead settings used were:

1. In Format 'imgdir'
2. Out Format 'imp2' for EBW1150 or 'imp1' for REB1200. Just substitue your reader's format here instead.
3. Use a 'portrait-p' profile and 'portrait-full' layout mode
4. Check the 'no dilation' box (I tried dilation and since the pi .png's are in a lower resolution it looks terrible!)
5. Click 'Convert'

Looks promising, now only get that pi algorithm incorporated into PDFRead (with GUI)!

Just a note that I attached .lrf and .prc versions of the above sample chap6.pdf here.

This is for the other (popular) small screened ebook readers, Sony PRS-500/505 and Kindle.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
My images are disappearing on small screen devices	sbrwake	Kindle Formats	2	01-10-2009 09:01 PM
Pre-render and cache PDF pages?	nekokami	iRex	3	07-02-2008 03:26 AM
PDF Text too small!	thacursedpie	iRex	9	03-18-2008 02:53 PM
Spies can run small devices on body heat. What about eBooks?	mogui	News	23	09-21-2007 01:31 PM
over 2 mins to render PDF page	reh_reh	iRex	6	11-11-2006 07:57 AM

08-06-2008, 05:59 AM	#76
Hanselda Enthusiast Posts: 42 Karma: 12 Join Date: Feb 2008 Device: CyBook, Sony PRS 600	That is really excellent work. In fact I made something very similar. But I did not go as far as to analyze the image! Some ideas: 1. Try to use the command pdfimage from pdflib, this can compile all the png images directly into a single PDF. It is much faster than using convert again. 2. Try to quantize the color of the png file. This will reduce the image file size significantly. For e-ink screen the color depth is only 4 - 16, compared to standard 8 bit channel with 256 colors. 3. This method in fact can also work for djvu file. With ddjvu command one can convert certain page into pgm: 'ddjvu -page=%i -scale=%i -format=pgm %s %s' %(pageno, dpi, inputfile, outputfile)

08-10-2008, 03:36 AM	#77
kentsin Junior Member Posts: 3 Karma: 10 Join Date: Apr 2008 Device: iphone	I use another pdf and got page: 1 page: 2 Traceback (most recent call last): File "/home/kentsin/pi/bin/pi_format.py", line 59, in <module> test_all(sys.argv[1]) File "/home/kentsin/pi/bin/pi_format.py", line 16, in test_all doc.reformat() File "/home/kentsin/pi/bin/pi.py", line 1154, in reformat page = Page(self, pn) File "/home/kentsin/pi/bin/pi.py", line 690, in __init__ BasicPage.__init__(self, doc, page_no, dpi) File "/home/kentsin/pi/bin/pi.py", line 569, in __init__ self.dpi = self.get_dpi() File "/home/kentsin/pi/bin/pi.py", line 695, in get_dpi dpi = self.doc.target_width * 50 / width ZeroDivisionError: float division

08-11-2008, 08:59 AM	#78
caritas Enthusiast Posts: 26 Karma: 161 Join Date: Feb 2008 Device: Sony PRS505	Version 0.6 is released. Binary and source can be downloaded from the first post of thread. ChangeLog: 2008-08-11 Huang Ying <ying.huang.caritas@gmail.com> * Version: 0.6 * pi.py: Initial implementation of embolden. * pi.py: Use norm coordinate in class Page and Line. * pi.py: Add edge trimming support. * pi.py: Add run pages mode. * pi.py: Add page range support. * pi.py: Re-work ImageOutput, split multi-page image. * pi.py: Rotate during scale if approriate. * img_dir_to_pdf.sh: Add color reduction support.

08-11-2008, 06:03 PM	#79
Gianfranco computer scientist Posts: 108 Karma: 1587 Join Date: Aug 2008 Location: Gothenburg Device: Gen 3	I used v0.5 to merge all files into a pdf, but the result was negated. The text was white and the page was black, what could have caused this? Am I the only one who has experienced it? Best regards Gianfranco Alongi PS: Great tool !

08-12-2008, 08:27 AM	#81
Gianfranco computer scientist Posts: 108 Karma: 1587 Join Date: Aug 2008 Location: Gothenburg Device: Gen 3	Okay. Nice. I'll try v 0.6 directly once I come home from work And once again;;; what a great tool Maybe you should consider releasing a howto and tutorial on the tool caritas?

08-12-2008, 05:23 PM	#82
Gianfranco computer scientist Posts: 108 Karma: 1587 Join Date: Aug 2008 Location: Gothenburg Device: Gen 3	I used the new release and I am pleased I wrote about this a little in my blog

08-30-2008, 04:50 AM	#87
caritas Enthusiast Posts: 26 Karma: 161 Join Date: Feb 2008 Device: Sony PRS505	Version 0.7 is released, ChangeLog is as follow: 2008-08-30 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.7 * pi.py: Add LRF output support. * pi.py: Add TOC support for LRF output format * pi.py: Add output rotate support. * pdfminfo: Add pdfminfo to extract PDF information such as TOC, title, author, etc. * overall: Add initial windows support, thanks ashkulz of mobileread forum.

Advert

Advert