Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-25-2008, 08:50 AM   #16
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by vinniet View Post
If anyone complies this under Windows, can they share it.

Thanks!
Yes, please, my cygwin installation needs updating and is not working yet...
nrapallo is offline   Reply With Quote
Old 04-25-2008, 12:26 PM   #17
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by nrapallo View Post
Yes, please, my cygwin installation needs updating and is not working yet...
OK my update is now finished. Attached .zip is a windows executable compiled under cygwin. It requires the two .dll included therein to properly 'run' the program 'pi.exe'. Can be placed in your path instead of the directory where pi.exe is (BTW, pi = pdftoimage)

Attached is a sample (converted to .gif) that was produced by invoking:
Code:
pi page.pgm output.pgm
Notice how the indenting is retained, but this sometimes breaks down for bullet items. This is a great technique!
Attached Thumbnails
Click image for larger version

Name:	page.gif
Views:	632
Size:	194.7 KB
ID:	12387   Click image for larger version

Name:	output.gif
Views:	612
Size:	185.8 KB
ID:	12388  
Attached Files
File Type: zip pi-exe.zip (801.1 KB, 637 views)
File Type: zip page-pgm.zip (174.5 KB, 482 views)

Last edited by nrapallo; 04-25-2008 at 12:55 PM. Reason: added sample page.pgm file (zipped, of course)
nrapallo is offline   Reply With Quote
Old 04-25-2008, 12:28 PM   #18
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Windows executable now available

Intentionally deleted.

p.s. I've always wanted to say that...

Last edited by nrapallo; 04-25-2008 at 10:14 PM. Reason: see above post for Windows executable
nrapallo is offline   Reply With Quote
Old 04-25-2008, 12:42 PM   #19
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,782
Karma: 5137308
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by nrapallo View Post
Those using Windows who want to try this should get the compiled windows executable located here!
You need to collect all this stuff and make a wiki for it.

Dale
DaleDe is offline   Reply With Quote
Old 04-25-2008, 12:48 PM   #20
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by DaleDe View Post
You need to collect all this stuff and make a wiki for it.

Dale
Soon...

But that's like writing documentation and stuff... Brain freeze time.

Just ask tompe what his favourite task is NOT!
nrapallo is offline   Reply With Quote
Old 04-25-2008, 01:04 PM   #21
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by nrapallo View Post
Notice how the indenting is retained, but this sometimes breaks down for bullet items.
caritas:

An alternative way to align the 'next line segment' with the 'previous line segment' could be employed.

I think the 'next line segment' should be re-aligned with its 'next line segment' if not blank. This may help the bullet items line up better and avoid two indented lines when the beginning line of a paragraph is split.

What do you think?
nrapallo is offline   Reply With Quote
Old 04-25-2008, 06:24 PM   #22
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Here's a little Bash script that will convert all PDFs in the current folder first to PGMs with pdftoppm then run this algorithm on the PGMs, make a small white border around the text (I like to have a small border), crop every images to three overlapping images (it's not exactly what I had in mind in the above post, but it'll work good enough in most cases) and finally convert it back to a new PDF.

Requirements: pi, pdftoppm (part of Poppler and/or Xpdf), ImageMagick (v6.3.2 or newer is needed for the -extent option to work properly), libtiff

Code:
#!/bin/bash
set -e

for i in *.pdf; do
	if [ -f "$i" ]
	then
		echo "Converting file \"$i\". Please wait ..."
		PDFName="`basename "$i" .pdf`"
		mkdir "Temp-$PDFName"
		cd "Temp-$PDFName"
		pdftoppm -r 180 -gray "../$i" "$PDFName"

		for i in *.pgm; do
			pi "$i" "New-$i"
			rm "$i"
		done		
		
		for i in *.pgm; do
			convert "$i" +compress -gravity Center -extent "106%x101%" -gravity East -extent "104%x100%" "`basename "$i" .pgm`.tif"
			rm "$i"
		done
		
		for i in *.tif; do
			convert "$i" +compress -gravity North -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-1.tif"
			convert "$i" +compress -gravity Center -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-2.tif"
			convert "$i" +compress -gravity South -crop "100%x34%" +repage -depth 8 "`basename "$i" .tif`-3.tif"
			rm "$i"
		done
		
		tiffcp *.tif "New-$PDFName.tif"
		
		tiff2pdf -z "New-$PDFName.tif" -o "New-$PDFName.pdf" -t "$PDFName"
		
		rm *.tif
		
		mv "New-$PDFName.pdf" ../
		
		cd ..
		
		rmdir "Temp-$PDFName"
	else
		echo "ERROR: No PDF files found"
		exit 1
	fi
done

echo "Done."
exit 0
IceHand is offline   Reply With Quote
Old 04-25-2008, 10:52 PM   #23
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by IceHand View Post
...make a small white border around the text (I like to have a small border), crop every images to three overlapping images (it's not exactly what I had in mind in the above post, but it'll work good enough in most cases) and finally convert it back to a new PDF.
Just out of curiosity, by splitting the resulting images (which are doubled in height) by three vertically, does it look good (aspect ratio wise) on your ereader. This split by three should be the best if the original pdf has no/little white margins.

I think if the orginal pdf has at least 20% white space on the side/margin (that gets cropped out anyways), then to retain the newer cropped image's aspect ratio, a split by four vertically should (theorectically) look better.

Any (practical) thoughts?

EDIT: the split by three or four refers to vertical cropping and not to the place to split horizontally the line in half (or in future third, quarters...).

Last edited by nrapallo; 04-26-2008 at 04:16 PM. Reason: clarified what split by three or four means
nrapallo is offline   Reply With Quote
Old 04-26-2008, 09:26 AM   #24
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Quote:
Originally Posted by nrapallo View Post
[...] I think if the orginal pdf has at least 20% white space on the side/margin (that gets cropped out anyways), then to retain the newer cropped image's aspect ratio, a split by four should (theorectically) look better.

Any (practical) thoughts?
Well, it rather depends on the original PDF and how large you want the text size to look like. I attached an image of a random page how it would with the split by three look on my Cybook (set to "Fit Height"). You can see there are rather large borders on the left and right side. If I set it to "Fit Width" there are only the small borders I added via my script, but the text size is too large for my taste and I would have to scroll down to see the whole page.
Attached Thumbnails
Click image for larger version

Name:	cybook_pdf_pi.png
Views:	392
Size:	32.1 KB
ID:	12403  
IceHand is offline   Reply With Quote
Old 04-26-2008, 09:39 AM   #25
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by IceHand View Post
Well, it rather depends on the original PDF and how large you want the text size to look like. I attached an image of a random page how it would with the split by three look on my Cybook (set to "Fit Height"). You can see there are rather large borders on the left and right side. If I set it to "Fit Width" there are only the small borders I added via my script, but the text size is too large for my taste and I would have to scroll down to see the whole page.
Exactly, why a split by 'four' may be useful, so as to get rid of those wider than acceptable white margins.

The size of the resulting text is largely due to the size format of the original pdf. If starting with a A4/Letter sized pdf with 10/12pt text, then the resulting text should not be so large as to be unacceptable. If starting with a Sony ereader sized pdf then why bother, the resulting text will appear huge.

In the end this technique will work best when the original pdf, as view on the ereader hardware, is just too small to read comfortably.

I have only done a few tests, but think that in real world conversions, the better split ratio will be between three and four.
nrapallo is offline   Reply With Quote
Old 04-26-2008, 01:44 PM   #26
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Quote:
Originally Posted by nrapallo View Post
Exactly, why a split by 'four' may be useful, so as to get rid of those wider than acceptable white margins. [...]
You're talking about the width split made by the algorithm right? If yes, I agree. Having longer lines would definitely be a good thing.
The concept is nice, but far from perfect yet. I noticed that the program aborts with a segmentation fault error when processing some pages, mostly with images.

Last edited by IceHand; 04-26-2008 at 05:18 PM.
IceHand is offline   Reply With Quote
Old 04-26-2008, 01:59 PM   #27
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by IceHand View Post
You're talking about the width split made by the algorithm right? If yes, I agree. Having longer lines would definitely be a good thing.
The concept is nice, but far from perfect yet. I noticed that the program aborts with a segmentation fault error when processing some pages, mostly with images.
No, the height split!

After the pi processing, your bash script splits the resulting image in three vertically. I am speculating that, in general, splitting be three or four would be the optimal way to view on a ereader screen. Can you try splitting by four vetically for a directory of pdf and compare with your first results (splitting by three vertically)? Is the large white margin issue lessened with the split by four?

I've run into many segmentation faults as well, primarily converting coloured pages. I think the routine that tries to identify the individual lines of text may not be as robust when there are no 'white line gaps' between the lines of text. It needs better bounds checking or defaults if something goes wrong.
nrapallo is offline   Reply With Quote
Old 04-26-2008, 05:27 PM   #28
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Quote:
Originally Posted by nrapallo View Post
No, the height split!

After the pi processing, your bash script splits the resulting image in three vertically. I am speculating that, in general, splitting be three or four would be the optimal way to view on a ereader screen. Can you try splitting by four vetically for a directory of pdf and compare with your first results (splitting by three vertically)? Is the large white margin issue lessened with the split by four?
I'm sorry, I was confused there for a second (and edited my post, but you'd already quoted me so I changed it back).
Anyway, yes, splitting by four gives very good results concerning the left and right border – the text size is rather large though (see attached image).

It would be great if the pi algorithm would have an option to specify the page width and height in pixels and optimise the line breaks accordingly. The size of the text could then be defined by changing the image density (dpi) with pdftoppm. Right now there's no difference when viewing it with my Cybook, because the number of characters/words per line will always be the same (well of course the text will look fuzzy if the density is set too low).
Attached Thumbnails
Click image for larger version

Name:	cybook_pi2.png
Views:	331
Size:	41.2 KB
ID:	12410  

Last edited by IceHand; 04-26-2008 at 05:42 PM.
IceHand is offline   Reply With Quote
Old 04-26-2008, 05:56 PM   #29
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by IceHand View Post
Anyway, yes, splitting by four gives very good results concerning the left and right border – the text size is rather large though (see attached image).

It would be great if the pi algorithm would have an option to specify the page width and height in pixels and optimise the line breaks accordingly.
BTW, what was the size of the original pdf? A4/Letter size? That text does look uncomfortably large.

I would suspect that the text would be more reasonable (size-wise) if the original had more words per line.

It is good that we are 'flushing' out what would be nice, in case the original poster wants to improve his algorithm to incorporate:
Wish list
1) split lines in two equal halves or optional equal thirds or equal quarters
2) crop the resulting image by three or four to retain the ereader's aspect ratio (usually 0.75 = 480/640) instead of just having a doubled height page.
3) allow the 1/6 flex to be calculated by some words per line estimate or user-input.
4) allow coloured backgrounds/text input images instead of just grayscale (pgm vs ppm/png/pbm). Accomodate images somehow, perhaps shrink them down.
5) re-align the 'split line segment' (second half of line) to align with the next line's indenting if its not blank. This will make the first line indent and bullet items line up better.
6) avoid segmentation faults

Last edited by nrapallo; 04-26-2008 at 06:15 PM.
nrapallo is offline   Reply With Quote
Old 04-26-2008, 06:09 PM   #30
IceHand
Linux User
IceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheeseIceHand can extract oil from cheese
 
IceHand's Avatar
 
Posts: 309
Karma: 1082
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
Quote:
Originally Posted by nrapallo View Post
BTW, what was the size of the original pdf? A4/Letter size? That text does look uncomfortably large.

I would suspect that the text would be more reasonable (size-wise) if the original had more words per line.
152x225 mm with about 12-16 words per line. And yes, if the text had had more words per line it would have looked better size wise.

Quote:
Originally Posted by nrapallo View Post
Wish list
1) split lines in two equal halfs or optional equal thirds or equal quarters
2) crop the resulting image by three or four to retain the ereader's aspect ratio (usually 0.75 = 480/640) instead of just having a doubled height page.
3) allow the 1/6 flex to be calculated by some words per line estimate or user-input.
4) allow coloured backgrounds/text input images instead of just grayscale (pgm vs ppm/png/pbm). Accomodate images somehow, perhaps shrink them down.
5) avoid segmentation faults
6) fix the indentation algorithm. Right now the line that goes after the indented line will be indented as well.
EDIT: Ah, you've noticed that problem too
IceHand is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My images are disappearing on small screen devices sbrwake Kindle Formats 2 01-10-2009 10:01 PM
Pre-render and cache PDF pages? nekokami iRex 3 07-02-2008 04:26 AM
PDF Text too small! thacursedpie iRex 9 03-18-2008 03:53 PM
Spies can run small devices on body heat. What about eBooks? mogui News 23 09-21-2007 02:31 PM
over 2 mins to render PDF page reh_reh iRex 6 11-11-2006 08:57 AM


All times are GMT -4. The time now is 06:31 PM.


MobileRead.com is a privately owned, operated and funded community.