04-25-2008, 07:50 AM | #16 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
|
04-25-2008, 11:26 AM | #17 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Attached is a sample (converted to .gif) that was produced by invoking: Code:
pi page.pgm output.pgm Last edited by nrapallo; 04-25-2008 at 11:55 AM. Reason: added sample page.pgm file (zipped, of course) |
|
Advert | |
|
04-25-2008, 11:28 AM | #18 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Windows executable now available
Intentionally deleted.
p.s. I've always wanted to say that... Last edited by nrapallo; 04-25-2008 at 09:14 PM. Reason: see above post for Windows executable |
04-25-2008, 11:42 AM | #19 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
04-25-2008, 11:48 AM | #20 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
|
Advert | |
|
04-25-2008, 12:04 PM | #21 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
An alternative way to align the 'next line segment' with the 'previous line segment' could be employed. I think the 'next line segment' should be re-aligned with its 'next line segment' if not blank. This may help the bullet items line up better and avoid two indented lines when the beginning line of a paragraph is split. What do you think? |
|
04-25-2008, 05:24 PM | #22 |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Here's a little Bash script that will convert all PDFs in the current folder first to PGMs with pdftoppm then run this algorithm on the PGMs, make a small white border around the text (I like to have a small border), crop every images to three overlapping images (it's not exactly what I had in mind in the above post, but it'll work good enough in most cases) and finally convert it back to a new PDF.
Requirements: pi, pdftoppm (part of Poppler and/or Xpdf), ImageMagick (v6.3.2 or newer is needed for the -extent option to work properly), libtiff Code:
#!/bin/bash set -e for i in *.pdf; do if [ -f "$i" ] then echo "Converting file \"$i\". Please wait ..." PDFName="`basename "$i" .pdf`" mkdir "Temp-$PDFName" cd "Temp-$PDFName" pdftoppm -r 180 -gray "../$i" "$PDFName" for i in *.pgm; do pi "$i" "New-$i" rm "$i" done for i in *.pgm; do convert "$i" +compress -gravity Center -extent "106%x101%" -gravity East -extent "104%x100%" "`basename "$i" .pgm`.tif" rm "$i" done for i in *.tif; do convert "$i" +compress -gravity North -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-1.tif" convert "$i" +compress -gravity Center -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-2.tif" convert "$i" +compress -gravity South -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-3.tif" rm "$i" done tiffcp *.tif "New-$PDFName.tif" tiff2pdf -z "New-$PDFName.tif" -o "New-$PDFName.pdf" -t "$PDFName" rm *.tif mv "New-$PDFName.pdf" ../ cd .. rmdir "Temp-$PDFName" else echo "ERROR: No PDF files found" exit 1 fi done echo "Done." exit 0 |
04-25-2008, 09:52 PM | #23 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
I think if the orginal pdf has at least 20% white space on the side/margin (that gets cropped out anyways), then to retain the newer cropped image's aspect ratio, a split by four vertically should (theorectically) look better. Any (practical) thoughts? EDIT: the split by three or four refers to vertical cropping and not to the place to split horizontally the line in half (or in future third, quarters...). Last edited by nrapallo; 04-26-2008 at 03:16 PM. Reason: clarified what split by three or four means |
|
04-26-2008, 08:26 AM | #24 |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Well, it rather depends on the original PDF and how large you want the text size to look like. I attached an image of a random page how it would with the split by three look on my Cybook (set to "Fit Height"). You can see there are rather large borders on the left and right side. If I set it to "Fit Width" there are only the small borders I added via my script, but the text size is too large for my taste and I would have to scroll down to see the whole page.
|
04-26-2008, 08:39 AM | #25 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
The size of the resulting text is largely due to the size format of the original pdf. If starting with a A4/Letter sized pdf with 10/12pt text, then the resulting text should not be so large as to be unacceptable. If starting with a Sony ereader sized pdf then why bother, the resulting text will appear huge. In the end this technique will work best when the original pdf, as view on the ereader hardware, is just too small to read comfortably. I have only done a few tests, but think that in real world conversions, the better split ratio will be between three and four. |
|
04-26-2008, 12:44 PM | #26 | |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Quote:
The concept is nice, but far from perfect yet. I noticed that the program aborts with a segmentation fault error when processing some pages, mostly with images. Last edited by IceHand; 04-26-2008 at 04:18 PM. |
|
04-26-2008, 12:59 PM | #27 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
After the pi processing, your bash script splits the resulting image in three vertically. I am speculating that, in general, splitting be three or four would be the optimal way to view on a ereader screen. Can you try splitting by four vetically for a directory of pdf and compare with your first results (splitting by three vertically)? Is the large white margin issue lessened with the split by four? I've run into many segmentation faults as well, primarily converting coloured pages. I think the routine that tries to identify the individual lines of text may not be as robust when there are no 'white line gaps' between the lines of text. It needs better bounds checking or defaults if something goes wrong. |
|
04-26-2008, 04:27 PM | #28 | |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Quote:
Anyway, yes, splitting by four gives very good results concerning the left and right border – the text size is rather large though (see attached image). It would be great if the pi algorithm would have an option to specify the page width and height in pixels and optimise the line breaks accordingly. The size of the text could then be defined by changing the image density (dpi) with pdftoppm. Right now there's no difference when viewing it with my Cybook, because the number of characters/words per line will always be the same (well of course the text will look fuzzy if the density is set too low). Last edited by IceHand; 04-26-2008 at 04:42 PM. |
|
04-26-2008, 04:56 PM | #29 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
I would suspect that the text would be more reasonable (size-wise) if the original had more words per line. It is good that we are 'flushing' out what would be nice, in case the original poster wants to improve his algorithm to incorporate: Wish list
1) split lines in two equal halves or optional equal thirds or equal quarters 2) crop the resulting image by three or four to retain the ereader's aspect ratio (usually 0.75 = 480/640) instead of just having a doubled height page. 3) allow the 1/6 flex to be calculated by some words per line estimate or user-input. 4) allow coloured backgrounds/text input images instead of just grayscale (pgm vs ppm/png/pbm). Accomodate images somehow, perhaps shrink them down. 5) re-align the 'split line segment' (second half of line) to align with the next line's indenting if its not blank. This will make the first line indent and bullet items line up better. 6) avoid segmentation faults Last edited by nrapallo; 04-26-2008 at 05:15 PM. |
|
04-26-2008, 05:09 PM | #30 | ||
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Quote:
Quote:
EDIT: Ah, you've noticed that problem too |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
My images are disappearing on small screen devices | sbrwake | Kindle Formats | 2 | 01-10-2009 09:01 PM |
Pre-render and cache PDF pages? | nekokami | iRex | 3 | 07-02-2008 03:26 AM |
PDF Text too small! | thacursedpie | iRex | 9 | 03-18-2008 02:53 PM |
Spies can run small devices on body heat. What about eBooks? | mogui | News | 23 | 09-21-2007 01:31 PM |
over 2 mins to render PDF page | reh_reh | iRex | 6 | 11-11-2006 07:57 AM |