View Single Post
Old 04-22-2008, 08:52 AM   #1
caritas doesn't littercaritas doesn't litter
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505

I am interested in ebook reader for quite a while. But after trying with a 6-inch e-ink reader (Hanlin V3), I found it is almost useless to read normal PDF files on these machines. The font size is too small, while the page size is too wide.

So, a method to render PDF for these small devices is thought about and prototyped. The details are as follow:

1. Convert pdf to image. I use pdftoppm of xpdf. Such as:
pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a

2. Analyse the generated images. Break page into lines.

3. Divide each line long enough to two segments.

4. Rearrange the segments into a new page, with half of the width.

The example image before/after conversion is attached with the post. I think the result is acceptable.

The source code is attached with the post too. The source is released under the License of GPL v2/v3.

Best Regards,
Huang Ying

Basic Usage for version 0.4:

tar -xjf pi_0.4.tar.bz2
cd pi
cd test chap.conf
/* output goes in out directory */ out chap-rf.pdf

2008-09-20 Huang Ying <>

* Version: 0.8

* overall: Reorganize program in a more modular way.

* pi.image: Add unpaper support for scanned book

* pi.image: Add column compress support for scanned book

* pi.divide: Add simple divider for divide = 1

2008-08-30 Huang Ying <>

* Version: 0.7

* Add LRF output support.

* Add TOC support for LRF output format

* Add output rotate support.

* pdfminfo: Add pdfminfo to extract PDF information such as TOC,
title, author, etc.

* overall: Add initial windows support, thanks ashkulz of
mobileread forum.

2008-08-11 Huang Ying <>

* Version: 0.6

* Initial implementation of embolden.

* Use norm coordinate in class Page and Line.

* Add edge trimming support.

* Add run pages mode.

* Add page range support.

* Re-work ImageOutput, split multi-page image.

* Rotate during scale if approriate.

* Add color reduction support.

2008-05-17 Huang Ying <>

* Version: 0.5

* Detect word, and break lines at word end when possible.

* Re-align the 'split line segment' (second half of line)
to align with the next line's indenting when appropriate. This
will make the first line indent and bullet items line up better.

* Added to convert from images to pdf.

2008-05-10 Huang Ying <>

* Version: 0.4

* Some algorithms are configurable

* For some text may have problem, present both merged and divided

2008-05-03 Huang Ying <>

* Version: 0.3

* Rewrite most algorithm in python except the image parsing (break
image into lines and characters). This will make it easier to
add new algorithm (hack).

* Add some hacks to deal with equation and figure.

2008-04-29 Huang Ying <>

* Version: 0.2

* Split lines in two equal halves or optional equal thirds or
equal quarters

* Separate output image into customizable page size

* Flex can be designate by user configuration

* Calculate DPI for each page

* Figure detecting and special processing. The figures are scaled
to page width and output twice, scaled and split.

2008-04-23 Huang Ying <>

* Version: 0.1
Attached Thumbnails
Click image for larger version

Name:	chap6-04-0.png
Views:	1409
Size:	112.2 KB
ID:	15107   Click image for larger version

Name:	chap6-04-1.png
Views:	1040
Size:	16.8 KB
ID:	15108   Click image for larger version

Name:	chap6-04-2.png
Views:	1278
Size:	112.1 KB
ID:	15109   Click image for larger version

Name:	chap6-04-3.png
Views:	1127
Size:	147.2 KB
ID:	15110   Click image for larger version

Name:	chap6-04-4.png
Views:	969
Size:	88.9 KB
ID:	15111   Click image for larger version

Name:	pipeline.png
Views:	924
Size:	91.0 KB
ID:	16388  
Attached Files
File Type: gz pi.tar.gz (23.1 KB, 764 views)
File Type: bz2 pi_0.2.tar.bz2 (300.2 KB, 769 views)
File Type: bz2 pi_0.3.tar.bz2 (283.0 KB, 627 views)
File Type: bz2 pi_0.4.tar.bz2 (294.9 KB, 594 views)
File Type: bz2 pi_0.5.tar.bz2 (296.9 KB, 685 views)
File Type: bz2 pi_0.6.tar.bz2 (336.3 KB, 819 views)
File Type: bz2 pi_0.7.tar.bz2 (527.6 KB, 643 views)
File Type: bz2 pi_0.8.tar.bz2 (627.5 KB, 874 views)

Last edited by caritas; 09-20-2008 at 09:14 AM. Reason: Version update
caritas is offline   Reply With Quote