|
|||||||
|
You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community today, you will have fewer ads, access to post topics, communicate privately with other members, respond to polls, upload content and access many other special features. If you have any problems with the registration process or your account login, please contact us. Hint: Don't have time to visit us daily? Subscribe to our main RSS feed to receive our frontpage posts at your convenience. |
| PDF Adobe PDF is commonly used to distribute print content; usually not the ideal format for smaller displays |
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
Enthusiast
![]() ![]()
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
Hi,
I am interested in ebook reader for quite a while. But after trying with a 6-inch e-ink reader (Hanlin V3), I found it is almost useless to read normal PDF files on these machines. The font size is too small, while the page size is too wide. So, a method to render PDF for these small devices is thought about and prototyped. The details are as follow: 1. Convert pdf to image. I use pdftoppm of xpdf. Such as: pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a 2. Analyse the generated images. Break page into lines. 3. Divide each line long enough to two segments. 4. Rearrange the segments into a new page, with half of the width. The example image before/after conversion is attached with the post. I think the result is acceptable. The source code is attached with the post too. The source is released under the License of GPL v2/v3. Best Regards, Huang Ying Basic Usage for version 0.4: tar -xjf pi_0.4.tar.bz2 cd pi . env.sh cd test pi_format.py chap.conf /* output goes in out directory */ img_dir_to_pdf.sh out chap-rf.pdf 2008-09-20 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.8 * overall: Reorganize program in a more modular way. * pi.image: Add unpaper support for scanned book * pi.image: Add column compress support for scanned book * pi.divide: Add simple divider for divide = 1 2008-08-30 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.7 * pi.py: Add LRF output support. * pi.py: Add TOC support for LRF output format * pi.py: Add output rotate support. * pdfminfo: Add pdfminfo to extract PDF information such as TOC, title, author, etc. * overall: Add initial windows support, thanks ashkulz of mobileread forum. 2008-08-11 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.6 * pi.py: Initial implementation of embolden. * pi.py: Use norm coordinate in class Page and Line. * pi.py: Add edge trimming support. * pi.py: Add run pages mode. * pi.py: Add page range support. * pi.py: Re-work ImageOutput, split multi-page image. * pi.py: Rotate during scale if approriate. * img_dir_to_pdf.sh: Add color reduction support. 2008-05-17 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.5 * pi.py: Detect word, and break lines at word end when possible. * pi.py: Re-align the 'split line segment' (second half of line) to align with the next line's indenting when appropriate. This will make the first line indent and bullet items line up better. * img_dir_to_pdf.sh: Added to convert from images to pdf. 2008-05-10 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.4 * Some algorithms are configurable * For some text may have problem, present both merged and divided version. 2008-05-03 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.3 * Rewrite most algorithm in python except the image parsing (break image into lines and characters). This will make it easier to add new algorithm (hack). * pi.py: Add some hacks to deal with equation and figure. 2008-04-29 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.2 * Split lines in two equal halves or optional equal thirds or equal quarters * Separate output image into customizable page size * Flex can be designate by user configuration * Calculate DPI for each page * Figure detecting and special processing. The figures are scaled to page width and output twice, scaled and split. 2008-04-23 Huang Ying <huang.ying.caritas@gmail.com> * Version: 0.1 Last edited by caritas; 09-20-2008 at 09:14 AM. Reason: Version update |
|
|
|
|
|
#2 |
|
Lector minore
![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 210
Karma: 728
Join Date: Jan 2008
Device: Palm Centro, Apple iPhone 3G and Sony PRS-505
|
Result looks excellent for the amount of intelligence used in the algorithm.
This is a good hack for documents we can't reflow and resize. |
|
|
|
|
|
#3 |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,495
Karma: 27182
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Very nice idea, indeed!
I may try this out in PDFRead, as an alternative for smaller screen devices like the EBW1150. Hopefully I can just 'call' your executable from within PDFRead and avoid having to recode your efforts in python. ![]() I remember that the original developer of PDFRead was going to allow some type of reflow of pdf documents, but never released his efforts. One question though: Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like one-third or 25%?
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
#4 |
|
Enthusiast
![]() ![]()
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
>Is the split at half the page width "fixed" or can it be changed to a user inputted amount, like >one-third or 25%?
Now it is fixed. But why split the line at 1/3 or 1/4? One longer line and one short line will be produced for one original line. The actual page width generated now is 1/2+1/6 = 2/3 of original page text width. The additional 1/6 is used for finding the space between words. |
|
|
|
|
|
#5 | |
|
GuteBook/Mobi2IMP Creator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 2,495
Karma: 27182
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200, EBW1150 Device: REB1100, iLiad v2 System: WinXP SP3
|
Quote:
This is just like you do for 1/2 split (two equal halves with one line below the other). By extension, 1/4 split would result in four lines of text from one and quadruple the height! The reason this would be helpful would be to gain more clarity by rendering/cropping shorter lines for smaller screens. When I looked at your code, I thought this would be easy to do. I think the 1/6 would be constant amongst these differing split methods. Am I on the right track here?
__________________
-Nick ‹The REB1200 Guy› Have you tried GuteBook yet?
|
|
|
|
|
|
|
#6 |
|
Enthusiast
![]() ![]()
Posts: 26
Karma: 161
Join Date: Feb 2008
Device: Sony PRS505
|
OK, I see. It is easy to add such feature. And I think the 1/6 (flex) can be specified by user or analyzed from the PDF file too (by analyzing the average characters per line).
|
|
|
|
|
|
#7 |
|
Linux User
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 305
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Cybook Gen3
|
Very nice! Could you maybe include an option to split the resulting image into more than one image? For example cut at around 33% of the height without cutting the letters. I attached the original image that your program made and three images how the page could have been split with the option I'm thinking of.
|
|
|
|
|
|
#8 |
|
Cache Ninja!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 644
Karma: 2300
Join Date: Jan 2007
Location: Tokyo, Japan
Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear
|
Interesting take on getting the 'ol rasterized PDF's into your portable reader! Too bad the resolution isn't much better on these devices, I've been using my iPod Touch to read PDF's even though I have a Sony eReader. Still waiting on something better, but until then I might give this a shot. Guess it just chops pictures up in the mix, huh?
Thanks for the new slant on an old issue! |
|
|
|
|
|
#9 |
|
Zealot
![]() ![]()
Posts: 123
Karma: 104
Join Date: Jan 2008
Device: Sony Reader PRS-505
|
Quick comment, this doesn't compile under linux/ppc. Looks good tho, can it be scripted?
|
|
|
|
|
|
#10 |
|
Zealot
![]() ![]()
Posts: 123
Karma: 104
Join Date: Jan 2008
Device: Sony Reader PRS-505
|
I take that back, I had to delete the pi.o file, compiles fine, will test out.
|
|
|
|
|
|
#11 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
Posts: 322
Karma: 1723
Join Date: Dec 2007
Location: Münster, Germany
Device: iRex iLiad v2
|
Hey, this is way cool, I'll give it a try on some of my PDFs!
__________________
Best regards, Thomas Newest project: Automatic Downloader Framework - download newspapers etc. while connecting to iDS March 8 - 14: Read an E-Book Week |
|
|
|
|
|
#12 |
|
Connoisseur
![]()
Posts: 50
Karma: 97
Join Date: Oct 2007
Location: New Jersey
Device: Sony PRS-500
|
If anyone complies this under Windows, can they share it.
Thanks! |
|
|
|
|
|
#13 |
|
Junior Member
![]()
Posts: 2
Karma: 10
Join Date: Apr 2008
Device: iphone
|
WOOW!
How about a version for old chinese books which line vertically? |
|
|
|
|
|
#14 |
|
Groupie
![]() ![]() ![]() ![]()
Posts: 168
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony Reader
|
works flawlessly here (Linux Fedora FC8) - and much faster than I thought possible!
Next step I guess will be to reconstruct the document from the reformatted PGMs. Do you know a way? Alessandro |
|
|
|
|
|
#15 |
|
Junior Member
![]()
Posts: 4
Karma: 10
Join Date: Apr 2008
Device: Sony PRS 500
|
I see a lot of potential in this idea. Some future improvements could be:
OCR of the generated images to reconstruct the PDF Images (or otherwise unchopable content) could be rescaled down Although the first one is not a trivial task... |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Google PageRank Checksum Algorithm | doctorow | Lounge | 294 | 10-20-2008 09:40 AM |
| Michael Swanwick's A Small Room in Koboldtown free PDF | Kingston | Deals, Freebies, and Resources | 2 | 04-19-2008 10:18 PM |
| PDF Text too small! | thacursedpie | iLiad Troubleshooting | 9 | 03-18-2008 03:53 PM |
| Spies can run small devices on body heat. What about eBooks? | mogui | News and Commentary | 23 | 09-21-2007 02:31 PM |
| over 2 mins to render PDF page | reh_reh | iRex iLiad | 6 | 11-11-2006 08:57 AM |