04-22-2008, 07:52 AM | #1 |
Enthusiast
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
Hi,
I am interested in ebook reader for quite a while. But after trying with a 6-inch e-ink reader (Hanlin V3), I found it is almost useless to read normal PDF files on these machines. The font size is too small, while the page size is too wide. So, a method to render PDF for these small devices is thought about and prototyped. The details are as follow: 1. Convert pdf to image. I use pdftoppm of xpdf. Such as: pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a 2. Analyse the generated images. Break page into lines. 3. Divide each line long enough to two segments. 4. Rearrange the segments into a new page, with half of the width. The example image before/after conversion is attached with the post. I think the result is acceptable. The source code is attached with the post too. The source is released under the License of GPL v2/v3. Best Regards, Huang Ying Basic Usage for version 0.4: tar -xjf pi_0.4.tar.bz2 cd pi . env.sh cd test pi_format.py chap.conf /* output goes in out directory */ img_dir_to_pdf.sh out chap-rf.pdf 2008-09-20 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.8 * overall: Reorganize program in a more modular way. * pi.image: Add unpaper support for scanned book * pi.image: Add column compress support for scanned book * pi.divide: Add simple divider for divide = 1 2008-08-30 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.7 * pi.py: Add LRF output support. * pi.py: Add TOC support for LRF output format * pi.py: Add output rotate support. * pdfminfo: Add pdfminfo to extract PDF information such as TOC, title, author, etc. * overall: Add initial windows support, thanks ashkulz of mobileread forum. 2008-08-11 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.6 * pi.py: Initial implementation of embolden. * pi.py: Use norm coordinate in class Page and Line. * pi.py: Add edge trimming support. * pi.py: Add run pages mode. * pi.py: Add page range support. * pi.py: Re-work ImageOutput, split multi-page image. * pi.py: Rotate during scale if approriate. * img_dir_to_pdf.sh: Add color reduction support. 2008-05-17 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.5 * pi.py: Detect word, and break lines at word end when possible. * pi.py: Re-align the 'split line segment' (second half of line) to align with the next line's indenting when appropriate. This will make the first line indent and bullet items line up better. * img_dir_to_pdf.sh: Added to convert from images to pdf. 2008-05-10 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.4 * Some algorithms are configurable * For some text may have problem, present both merged and divided version. 2008-05-03 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.3 * Rewrite most algorithm in python except the image parsing (break image into lines and characters). This will make it easier to add new algorithm (hack). * pi.py: Add some hacks to deal with equation and figure. 2008-04-29 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.2 * Split lines in two equal halves or optional equal thirds or equal quarters * Separate output image into customizable page size * Flex can be designate by user configuration * Calculate DPI for each page * Figure detecting and special processing. The figures are scaled to page width and output twice, scaled and split. 2008-04-23 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.1 Last edited by caritas; 09-20-2008 at 08:14 AM. Reason: Version update |
04-22-2008, 10:58 AM | #2 |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Result looks excellent for the amount of intelligence used in the algorithm.
This is a good hack for documents we can't reflow and resize. |
Advert | |
|
04-22-2008, 11:13 AM | #3 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Very nice idea, indeed!
I may try this out in PDFRead, as an alternative for smaller screen devices like the EBW1150. Hopefully I can just 'call' your executable from within PDFRead and avoid having to recode your efforts in python. I remember that the original developer of PDFRead was going to allow some type of reflow of pdf documents, but never released his efforts. One question though: Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like one-third or 25%?
|
04-22-2008, 09:18 PM | #4 |
Enthusiast
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
>Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like >one-third or 25%?
Now it is fixed. But why split the line at 1/3 or 1/4? One longer line and one short line will be produced for one original line. The actual page width generated now is 1/2+1/6 = 2/3 of original page text width. The additional 1/6 is used for finding the space between words. |
04-22-2008, 09:47 PM | #5 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
This is just like you do for 1/2 split (two equal halves with one line below the other). By extension, 1/4 split would result in four lines of text from one and quadruple the height! The reason this would be helpful would be to gain more clarity by rendering/cropping shorter lines for smaller screens. When I looked at your code, I thought this would be easy to do. I think the 1/6 would be constant amongst these differing split methods. Am I on the right track here? |
|
Advert | |
|
04-22-2008, 10:58 PM | #6 |
Enthusiast
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
OK, I see. It is easy to add such feature. And I think the 1/6 (flex) can be specified by user or analyzed from the PDF file too (by analyzing the average characters per line).
|
04-23-2008, 07:44 AM | #7 |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Very nice! Could you maybe include an option to split the resulting image into more than one image? For example cut at around 33% of the height without cutting the letters. I attached the original image that your program made and three images how the page could have been split with the option I'm thinking of.
|
04-23-2008, 09:26 AM | #8 |
Cache Ninja!
Posts: 643
Karma: 1002300
Join Date: Jan 2007
Location: Tokyo, Japan
Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear
|
Interesting take on getting the 'ol rasterized PDF's into your portable reader! Too bad the resolution isn't much better on these devices, I've been using my iPod Touch to read PDF's even though I have a Sony eReader. Still waiting on something better, but until then I might give this a shot. Guess it just chops pictures up in the mix, huh?
Thanks for the new slant on an old issue! |
04-23-2008, 01:55 PM | #9 |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quick comment, this doesn't compile under linux/ppc. Looks good tho, can it be scripted?
|
04-23-2008, 01:56 PM | #10 |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
I take that back, I had to delete the pi.o file, compiles fine, will test out.
|
04-23-2008, 06:36 PM | #11 |
Addict
Posts: 325
Karma: 1725
Join Date: Dec 2007
Location: Münster, Germany
Device: iRex iLiad v2
|
Hey, this is way cool, I'll give it a try on some of my PDFs!
|
04-23-2008, 08:23 PM | #12 |
Connoisseur
Posts: 59
Karma: 97
Join Date: Oct 2007
Location: New Jersey
Device: Sony PRS-500
|
If anyone complies this under Windows, can they share it.
Thanks! |
04-24-2008, 03:51 AM | #13 |
Junior Member
Posts: 3
Karma: 10
Join Date: Apr 2008
Device: iphone
|
WOOW!
How about a version for old chinese books which line vertically? |
04-24-2008, 03:57 AM | #14 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
works flawlessly here (Linux Fedora FC8) - and much faster than I thought possible!
Next step I guess will be to reconstruct the document from the reformatted PGMs. Do you know a way? Alessandro |
04-24-2008, 05:35 PM | #15 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2008
Device: Sony PRS 500
|
I see a lot of potential in this idea. Some future improvements could be:
OCR of the generated images to reconstruct the PDF Images (or otherwise unchopable content) could be rescaled down Although the first one is not a trivial task... |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
My images are disappearing on small screen devices | sbrwake | Kindle Formats | 2 | 01-10-2009 09:01 PM |
Pre-render and cache PDF pages? | nekokami | iRex | 3 | 07-02-2008 03:26 AM |
PDF Text too small! | thacursedpie | iRex | 9 | 03-18-2008 02:53 PM |
Spies can run small devices on body heat. What about eBooks? | mogui | News | 23 | 09-21-2007 01:31 PM |
over 2 mins to render PDF page | reh_reh | iRex | 6 | 11-11-2006 07:57 AM |