Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-22-2008, 08:52 AM   #1
caritas
Enthusiast
caritas doesn't littercaritas doesn't litter
 
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
Hi,

I am interested in ebook reader for quite a while. But after trying with a 6-inch e-ink reader (Hanlin V3), I found it is almost useless to read normal PDF files on these machines. The font size is too small, while the page size is too wide.

So, a method to render PDF for these small devices is thought about and prototyped. The details are as follow:

1. Convert pdf to image. I use pdftoppm of xpdf. Such as:
pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a

2. Analyse the generated images. Break page into lines.

3. Divide each line long enough to two segments.

4. Rearrange the segments into a new page, with half of the width.

The example image before/after conversion is attached with the post. I think the result is acceptable.

The source code is attached with the post too. The source is released under the License of GPL v2/v3.

Best Regards,
Huang Ying

Basic Usage for version 0.4:

tar -xjf pi_0.4.tar.bz2
cd pi
. env.sh
cd test
pi_format.py chap.conf
/* output goes in out directory */
img_dir_to_pdf.sh out chap-rf.pdf


2008-09-20 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.8

* overall: Reorganize program in a more modular way.

* pi.image: Add unpaper support for scanned book

* pi.image: Add column compress support for scanned book

* pi.divide: Add simple divider for divide = 1

2008-08-30 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.7

* pi.py: Add LRF output support.

* pi.py: Add TOC support for LRF output format

* pi.py: Add output rotate support.

* pdfminfo: Add pdfminfo to extract PDF information such as TOC,
title, author, etc.

* overall: Add initial windows support, thanks ashkulz of
mobileread forum.

2008-08-11 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.6

* pi.py: Initial implementation of embolden.

* pi.py: Use norm coordinate in class Page and Line.

* pi.py: Add edge trimming support.

* pi.py: Add run pages mode.

* pi.py: Add page range support.

* pi.py: Re-work ImageOutput, split multi-page image.

* pi.py: Rotate during scale if approriate.

* img_dir_to_pdf.sh: Add color reduction support.

2008-05-17 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.5

* pi.py: Detect word, and break lines at word end when possible.

* pi.py: Re-align the 'split line segment' (second half of line)
to align with the next line's indenting when appropriate. This
will make the first line indent and bullet items line up better.

* img_dir_to_pdf.sh: Added to convert from images to pdf.

2008-05-10 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.4

* Some algorithms are configurable

* For some text may have problem, present both merged and divided
version.


2008-05-03 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.3

* Rewrite most algorithm in python except the image parsing (break
image into lines and characters). This will make it easier to
add new algorithm (hack).

* pi.py: Add some hacks to deal with equation and figure.


2008-04-29 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.2

* Split lines in two equal halves or optional equal thirds or
equal quarters

* Separate output image into customizable page size

* Flex can be designate by user configuration

* Calculate DPI for each page

* Figure detecting and special processing. The figures are scaled
to page width and output twice, scaled and split.


2008-04-23 Huang Ying <huang.ying.caritas@gmail.com>

* Version: 0.1
Attached Thumbnails
Click image for larger version

Name:	chap6-04-0.png
Views:	1236
Size:	112.2 KB
ID:	15107   Click image for larger version

Name:	chap6-04-1.png
Views:	893
Size:	16.8 KB
ID:	15108   Click image for larger version

Name:	chap6-04-2.png
Views:	1124
Size:	112.1 KB
ID:	15109   Click image for larger version

Name:	chap6-04-3.png
Views:	970
Size:	147.2 KB
ID:	15110   Click image for larger version

Name:	chap6-04-4.png
Views:	808
Size:	88.9 KB
ID:	15111   Click image for larger version

Name:	pipeline.png
Views:	760
Size:	91.0 KB
ID:	16388  
Attached Files
File Type: gz pi.tar.gz (23.1 KB, 695 views)
File Type: bz2 pi_0.2.tar.bz2 (300.2 KB, 604 views)
File Type: bz2 pi_0.3.tar.bz2 (283.0 KB, 563 views)
File Type: bz2 pi_0.4.tar.bz2 (294.9 KB, 515 views)
File Type: bz2 pi_0.5.tar.bz2 (296.9 KB, 612 views)
File Type: bz2 pi_0.6.tar.bz2 (336.3 KB, 733 views)
File Type: bz2 pi_0.7.tar.bz2 (527.6 KB, 557 views)
File Type: bz2 pi_0.8.tar.bz2 (627.5 KB, 795 views)

Last edited by caritas; 09-20-2008 at 09:14 AM. Reason: Version update
caritas is offline   Reply With Quote
Old 04-22-2008, 11:58 AM   #2
radius
Lector minore
radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.radius trips the light fantastic.
 
radius's Avatar
 
Posts: 362
Karma: 128734
Join Date: Jan 2008
Device: Sony PRS-T3, Amazon Kindle PaperWhite 2
Result looks excellent for the amount of intelligence used in the algorithm.

This is a good hack for documents we can't reflow and resize.
radius is offline   Reply With Quote
Old 04-22-2008, 12:13 PM   #3
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Very nice idea, indeed!

I may try this out in PDFRead, as an alternative for smaller screen devices like the EBW1150. Hopefully I can just 'call' your executable from within PDFRead and avoid having to recode your efforts in python.

I remember that the original developer of PDFRead was going to allow some type of reflow of pdf documents, but never released his efforts.

One question though:
Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like one-third or 25%?
nrapallo is offline   Reply With Quote
Old 04-22-2008, 10:18 PM   #4
caritas
Enthusiast
caritas doesn't littercaritas doesn't litter
 
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
>Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like >one-third or 25%?

Now it is fixed. But why split the line at 1/3 or 1/4? One longer line and one short line will be produced for one original line.

The actual page width generated now is 1/2+1/6 = 2/3 of original page text width. The additional 1/6 is used for finding the space between words.
caritas is offline   Reply With Quote
Old 04-22-2008, 10:47 PM   #5
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by caritas View Post
>Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like >one-third or 25%?

Now it is fixed. But why split the line at 1/3 or 1/4? One longer line and one short line will be produced for one original line.

The actual page width generated now is 1/2+1/6 = 2/3 of original page text width. The additional 1/6 is used for finding the space between words.
Sorry, what I meant by split at 1/3 is to have three equal portions of the line being split and then triple the page height to add those (two) additional lines beneath the line being split. Now the resulting page would be 1/3+1/6 = 1/2 of the original.

This is just like you do for 1/2 split (two equal halves with one line below the other).

By extension, 1/4 split would result in four lines of text from one and quadruple the height!

The reason this would be helpful would be to gain more clarity by rendering/cropping shorter lines for smaller screens.

When I looked at your code, I thought this would be easy to do. I think the 1/6 would be constant amongst these differing split methods.

Am I on the right track here?
nrapallo is offline   Reply With Quote
Old 04-22-2008, 11:58 PM   #6
caritas
Enthusiast
caritas doesn't littercaritas doesn't litter
 
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
OK, I see. It is easy to add such feature. And I think the 1/6 (flex) can be specified by user or analyzed from the PDF file too (by analyzing the average characters per line).
caritas is offline   Reply With Quote
Old 04-23-2008, 08:44 AM   #7
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Very nice! Could you maybe include an option to split the resulting image into more than one image? For example cut at around 33% of the height without cutting the letters. I attached the original image that your program made and three images how the page could have been split with the option I'm thinking of.
Attached Thumbnails
Click image for larger version

Name:	pi_org.png
Views:	757
Size:	94.0 KB
ID:	12346   Click image for larger version

Name:	pi_edit01.png
Views:	801
Size:	29.3 KB
ID:	12347   Click image for larger version

Name:	pi_edit02.png
Views:	688
Size:	34.3 KB
ID:	12348   Click image for larger version

Name:	pi_edit03.png
Views:	598
Size:	31.7 KB
ID:	12349  
IceHand is offline   Reply With Quote
Old 04-23-2008, 10:26 AM   #8
Azayzel
Cache Ninja!
Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.Azayzel ought to be getting tired of karma fortunes by now.
 
Azayzel's Avatar
 
Posts: 643
Karma: 1002300
Join Date: Jan 2007
Location: Tokyo, Japan
Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear
Interesting take on getting the 'ol rasterized PDF's into your portable reader! Too bad the resolution isn't much better on these devices, I've been using my iPod Touch to read PDF's even though I have a Sony eReader. Still waiting on something better, but until then I might give this a shot. Guess it just chops pictures up in the mix, huh?

Thanks for the new slant on an old issue!
Azayzel is offline   Reply With Quote
Old 04-23-2008, 02:55 PM   #9
sealbeater
Zealot
sealbeater doesn't littersealbeater doesn't litter
 
Posts: 123
Karma: 104
Join Date: Jan 2008
Device: Sony Reader PRS-505
Quick comment, this doesn't compile under linux/ppc. Looks good tho, can it be scripted?
sealbeater is offline   Reply With Quote
Old 04-23-2008, 02:56 PM   #10
sealbeater
Zealot
sealbeater doesn't littersealbeater doesn't litter
 
Posts: 123
Karma: 104
Join Date: Jan 2008
Device: Sony Reader PRS-505
I take that back, I had to delete the pi.o file, compiles fine, will test out.
sealbeater is offline   Reply With Quote
Old 04-23-2008, 07:36 PM   #11
-Thomas-
Addict
-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.-Thomas- once ate a cherry pie in a record 7 seconds.
 
-Thomas-'s Avatar
 
Posts: 325
Karma: 1725
Join Date: Dec 2007
Location: Münster, Germany
Device: iRex iLiad v2
Hey, this is way cool, I'll give it a try on some of my PDFs!
-Thomas- is offline   Reply With Quote
Old 04-23-2008, 09:23 PM   #12
vinniet
Connoisseur
vinniet has learned how to buy an e-book online
 
Posts: 59
Karma: 97
Join Date: Oct 2007
Location: New Jersey
Device: Sony PRS-500
If anyone complies this under Windows, can they share it.

Thanks!
vinniet is offline   Reply With Quote
Old 04-24-2008, 04:51 AM   #13
kentsin
Junior Member
kentsin began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Apr 2008
Device: iphone
WOOW!

How about a version for old chinese books which line vertically?
kentsin is offline   Reply With Quote
Old 04-24-2008, 04:57 AM   #14
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
works flawlessly here (Linux Fedora FC8) - and much faster than I thought possible!

Next step I guess will be to reconstruct the document from the reformatted PGMs.
Do you know a way?

Alessandro
alexxxm is offline   Reply With Quote
Old 04-24-2008, 06:35 PM   #15
Crook
Junior Member
Crook began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2008
Device: Sony PRS 500
I see a lot of potential in this idea. Some future improvements could be:

OCR of the generated images to reconstruct the PDF
Images (or otherwise unchopable content) could be rescaled down

Although the first one is not a trivial task...
Crook is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My images are disappearing on small screen devices sbrwake Kindle Formats 2 01-10-2009 10:01 PM
Pre-render and cache PDF pages? nekokami iRex 3 07-02-2008 04:26 AM
PDF Text too small! thacursedpie iRex 9 03-18-2008 03:53 PM
Spies can run small devices on body heat. What about eBooks? mogui News 23 09-21-2007 02:31 PM
over 2 mins to render PDF page reh_reh iRex 6 11-11-2006 08:57 AM


All times are GMT -4. The time now is 09:56 AM.


MobileRead.com is a privately owned, operated and funded community.