View Single Post
Old 04-22-2008, 08:52 AM   #1
caritas doesn't littercaritas doesn't litter
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505

I am interested in ebook reader for quite a while. But after trying with a 6-inch e-ink reader (Hanlin V3), I found it is almost useless to read normal PDF files on these machines. The font size is too small, while the page size is too wide.

So, a method to render PDF for these small devices is thought about and prototyped. The details are as follow:

1. Convert pdf to image. I use pdftoppm of xpdf. Such as:
pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a

2. Analyse the generated images. Break page into lines.

3. Divide each line long enough to two segments.

4. Rearrange the segments into a new page, with half of the width.

The example image before/after conversion is attached with the post. I think the result is acceptable.

The source code is attached with the post too. The source is released under the License of GPL v2/v3.

Best Regards,
Huang Ying

Basic Usage for version 0.4:

tar -xjf pi_0.4.tar.bz2
cd pi
cd test chap.conf
/* output goes in out directory */ out chap-rf.pdf

2008-09-20 Huang Ying <>

* Version: 0.8

* overall: Reorganize program in a more modular way.

* pi.image: Add unpaper support for scanned book

* pi.image: Add column compress support for scanned book

* pi.divide: Add simple divider for divide = 1

2008-08-30 Huang Ying <>

* Version: 0.7

* Add LRF output support.

* Add TOC support for LRF output format

* Add output rotate support.

* pdfminfo: Add pdfminfo to extract PDF information such as TOC,
title, author, etc.

* overall: Add initial windows support, thanks ashkulz of
mobileread forum.

2008-08-11 Huang Ying <>

* Version: 0.6

* Initial implementation of embolden.

* Use norm coordinate in class Page and Line.

* Add edge trimming support.

* Add run pages mode.

* Add page range support.

* Re-work ImageOutput, split multi-page image.

* Rotate during scale if approriate.

* Add color reduction support.

2008-05-17 Huang Ying <>

* Version: 0.5

* Detect word, and break lines at word end when possible.

* Re-align the 'split line segment' (second half of line)
to align with the next line's indenting when appropriate. This
will make the first line indent and bullet items line up better.

* Added to convert from images to pdf.

2008-05-10 Huang Ying <>

* Version: 0.4

* Some algorithms are configurable

* For some text may have problem, present both merged and divided

2008-05-03 Huang Ying <>

* Version: 0.3

* Rewrite most algorithm in python except the image parsing (break
image into lines and characters). This will make it easier to
add new algorithm (hack).

* Add some hacks to deal with equation and figure.

2008-04-29 Huang Ying <>

* Version: 0.2

* Split lines in two equal halves or optional equal thirds or
equal quarters

* Separate output image into customizable page size

* Flex can be designate by user configuration

* Calculate DPI for each page

* Figure detecting and special processing. The figures are scaled
to page width and output twice, scaled and split.

2008-04-23 Huang Ying <>

* Version: 0.1
Attached Thumbnails
Click image for larger version

Name:	chap6-04-0.png
Views:	1366
Size:	112.2 KB
ID:	15107   Click image for larger version

Name:	chap6-04-1.png
Views:	1008
Size:	16.8 KB
ID:	15108   Click image for larger version

Name:	chap6-04-2.png
Views:	1244
Size:	112.1 KB
ID:	15109   Click image for larger version

Name:	chap6-04-3.png
Views:	1090
Size:	147.2 KB
ID:	15110   Click image for larger version

Name:	chap6-04-4.png
Views:	935
Size:	88.9 KB
ID:	15111   Click image for larger version

Name:	pipeline.png
Views:	883
Size:	91.0 KB
ID:	16388  
Attached Files
File Type: gz pi.tar.gz (23.1 KB, 739 views)
File Type: bz2 pi_0.2.tar.bz2 (300.2 KB, 705 views)
File Type: bz2 pi_0.3.tar.bz2 (283.0 KB, 609 views)
File Type: bz2 pi_0.4.tar.bz2 (294.9 KB, 567 views)
File Type: bz2 pi_0.5.tar.bz2 (296.9 KB, 660 views)
File Type: bz2 pi_0.6.tar.bz2 (336.3 KB, 794 views)
File Type: bz2 pi_0.7.tar.bz2 (527.6 KB, 625 views)
File Type: bz2 pi_0.8.tar.bz2 (627.5 KB, 852 views)

Last edited by caritas; 09-20-2008 at 09:14 AM. Reason: Version update
caritas is offline   Reply With Quote