Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-08-2008, 05:58 AM   #1
Taesoo Kwon
Enthusiast
Taesoo Kwon doesn't litterTaesoo Kwon doesn't litter
 
Posts: 27
Karma: 163
Join Date: Nov 2008
Device: Kobo wifi
[Tool] Multi-column PDF files on 6 inch display.

I developed a program to convert PDF documents such as articles, and technical papers into a GIF sequence so as to be readable on a small screen of e-book devices. This program automatically detects contiguous and non-empty regions in a page, and based on the information, split the page into multiple low-res pages. Unnecessary margins are also automatically removed.

Download: PaperCrop

Screenshots:
Input pdf:


Output pdf:


Currently, only windows are supported. (It works on other platforms through wine though)
There may exist some bugs.
Thanks,

- Version 0.24 uploaded. source codes are available too.
- Version 0.3 uploaded. (All 0.24 users should upgrade to this version. Sorry for the crash problem. Version 0.3 outputs to a PDF file. Could anybody please test the output pdf file on a Sony Reader?)
- Version 0.4 uploaded.

Last edited by Taesoo Kwon; 03-20-2010 at 02:58 PM.
Taesoo Kwon is offline   Reply With Quote
Old 11-09-2008, 09:10 PM   #2
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Nice tool it would be great if this code could be added with PDFRead who needs a feature like this.

=X=
=X= is offline   Reply With Quote
Old 11-10-2008, 05:52 AM   #3
Taesoo Kwon
Enthusiast
Taesoo Kwon doesn't litterTaesoo Kwon doesn't litter
 
Posts: 27
Karma: 163
Join Date: Nov 2008
Device: Kobo wifi
Yes, I guess so. PDFRead supports command line mode, and paperCrop uses LUA script for generating output. So one who are familiar with LUA script can modify the .LUA files in the scripts folder such that the output images from PaperCrop are automatically converted to e-book files using PDFRead. But at the moment, I don't want to do the work by myself due to my lazyness.
I can provide the codes of paperCrop to anyone who are interested.
Taesoo Kwon is offline   Reply With Quote
Old 11-10-2008, 10:20 AM   #4
Pulp
Palm Addict
Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.
 
Pulp's Avatar
 
Posts: 477
Karma: 1001951
Join Date: Aug 2008
Device: Cybook Gen3 [512mb, FW: 1.5]
Thank you, this is a great tool!
Pulp is offline   Reply With Quote
Old 11-10-2008, 11:52 AM   #5
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Taesoo Kwon View Post
Yes, I guess so. PDFRead supports command line mode, and paperCrop uses LUA script for generating output. So one who are familiar with LUA script can modify the .LUA files in the scripts folder such that the output images from PaperCrop are automatically converted to e-book files using PDFRead. But at the moment, I don't want to do the work by myself due to my lazyness.
I can provide the codes of paperCrop to anyone who are interested.
PDFRead v1.8.2 already supports converting two-column layouts using the layout modes:
Code:
'landscape-2col' (with four quadrants/pages); 
'portrait-2col' (with four quadrants/pages);
However, PDFRead is not too "smart" in how it determines where the columns start/end; it just picks the midpoint of the page and splits it there! Of course, this will be wrong if the column widths (and side margins) are not equal.

I had looked into programming using LUA when I was porting/tweaking some PSP homebrew programs, so it would be easy to re-use your programming logic.

However, PDFRead is due for a major overhaul, so I will hold off doing this just now. I'll wait to see what ashkulz (original authour of PDFRead) does with any update to PDFRead and then go from there.

Nice effort though!

Last edited by nrapallo; 11-10-2008 at 12:07 PM. Reason: added link to latest version of PDFRead
nrapallo is offline   Reply With Quote
Old 11-12-2008, 04:02 AM   #6
soilwork
useR!
soilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enough
 
soilwork's Avatar
 
Posts: 299
Karma: 651
Join Date: Nov 2007
Location: NY
Device: Onyx Boox Max 2, Kobo Libra H2O, iRiver Story HD
I tried the program and it looks and works excellent. Especially, the program detects content well even when a wide table spans across the whole page width while the rest of the content is arranged in two-column.

However, I have a couple of suggestions, though.

1) Pre-trim option
In most articles, headers/footers are not necessary especially in small screen reading device. It would be great if you can implement pre-trim option (just like that in PDFLRF) before detecting the content.

2) Preventing from cutting the text in the middle
I noticed that, in some cases, a line of text is cut in the middle. Since the program already does a great job of detecting content, can you apply a similar logic/process to prevent this from happening when cutting the detected content into smaller gif/jpg/pngs?

3) Easier way to enter precise segmentation parameter.
To make fine changes in the segmentation, I noticed that I should use 'Tab' to highlight the sliding bar and then use left/right cursor key to change the number in the smallest increment. I would be easier if
A. double click on the bar will highlight it, and/or
B. double click on the displayed number allows users enter the number directly.

BTW, thanks for providing an excellent program.

Last edited by soilwork; 11-12-2008 at 04:32 AM. Reason: Added suggestions
soilwork is offline   Reply With Quote
Old 11-12-2008, 12:47 PM   #7
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
I've just used the software on a very complicated layout and it worked quite nice. I'm quite impressed

Suggestions
* Add a feature in the UI to make output JPG/GIF/PNG. Having to change the code is a bit cumbersome
* Add a cropbox that applies to all windows
* It would be nice if customized crop settings where saved per page so that all the adjustments can be made before the bulk conversion is executed.
* It would be nice if the final product was an eBook. If not maybe write a short tutorial here on how a user can create an ebook using calibre or comic2LRF. Where an LRF can be created from zipping up the files with a CBZ extension and runing these tools on the zip file.


Bugs/Issues
* Adjust crop settings with font size. In a pages the column space was quite tight so I had to decrease the column with. However the title of the stores had lager fonts where the spacing of the word equaled
* Some case the text was cut in half.
* There is overlap of crop shows up on some pages, where part of the second column shows up on the 1 column screen. The 2nd column shows up fine on the following screens but this is a bit distracting)

=X=
=X= is offline   Reply With Quote
Old 11-12-2008, 05:40 PM   #8
Taesoo Kwon
Enthusiast
Taesoo Kwon doesn't litterTaesoo Kwon doesn't litter
 
Posts: 27
Karma: 163
Join Date: Nov 2008
Device: Kobo wifi
Thank you for the suggestions, =X= and Soilwork.
I would implement several of the suggestions and bug fixes in the next version,
e.g. a crop box that applies to all windows, pretrim option, font-size dependent processing (The last one is very difficult for me to implement - Currently all the processing is done at a pixel-level, not using any PDF informations such as font-sizes, PDF crop boxes, and so on.)

I will also open the source codes based on the free GPL license. (Actually, this is mandatory having used some GPL libraries.)

At the moment, supporting ebook formats is not what I want to spend much time on. (simply because I started this project for my own needs, and I don't need such a functionality.) Sorry.
Taesoo Kwon is offline   Reply With Quote
Old 11-16-2008, 09:10 AM   #9
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
I'm sorry for the double post, but you convert pdf to jpg 800x600 pix.
The screen itself has a small bar on the bottom.
Isn't it better to convert to something like 790x600 pix?
just a question.
ProDigit is offline   Reply With Quote
Old 11-16-2008, 03:48 PM   #10
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Moderators can you please make this tread a sticky?
=X= is offline   Reply With Quote
Old 11-16-2008, 04:01 PM   #11
zelda_pinwheel
zeldinha zippy zeldissima
zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.
 
zelda_pinwheel's Avatar
 
Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
Quote:
Originally Posted by =X= View Post
Moderators can you please make this tread a sticky?
stuck.
zelda_pinwheel is offline   Reply With Quote
Old 11-16-2008, 05:27 PM   #12
Taesoo Kwon
Enthusiast
Taesoo Kwon doesn't litterTaesoo Kwon doesn't litter
 
Posts: 27
Karma: 163
Join Date: Nov 2008
Device: Kobo wifi
Quote:
Originally Posted by ProDigit View Post
Isn't it better to convert to something like 790x600 pix?
just a question.
I don't know exactly what is the best resolution for Sony reader.
Cybook and nuutbook (a netronix variant which I use) displays images at full screen, so 600*800 was the best for those devices.
The resoultion can be changed by editing config.lua.
Taesoo Kwon is offline   Reply With Quote
Old 11-17-2008, 12:53 AM   #13
soilwork
useR!
soilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enough
 
soilwork's Avatar
 
Posts: 299
Karma: 651
Join Date: Nov 2007
Location: NY
Device: Onyx Boox Max 2, Kobo Libra H2O, iRiver Story HD
Bugs in PaperCrop 0.24

Hello, I found some bugs in version 0.24

- Every time I try 'process all pages', at the end, I got a dialog box saying
"Do you want to overwrite file \...\00000_000.jpg?"
If I answer yes, the last image overwrites the first file (00000_000.jpg).

- In some cases, the original PDF uses one font throughout the page, but the conversion results look different.
Please refer to the pictures attached. In the original PDF, all of them have the same font.
This bug does not affect usability but it is a bit strange.

- The program crashes with some files. If you would like, I will email PDF files causing the problem.

Thanks again for the great program.
Attached Thumbnails
Click image for larger version

Name:	original.png
Views:	2312
Size:	73.3 KB
ID:	17954   Click image for larger version

Name:	converted.jpg
Views:	2298
Size:	326.4 KB
ID:	17955  
soilwork is offline   Reply With Quote
Old 11-17-2008, 11:20 AM   #14
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Quote:
Originally Posted by Taesoo Kwon View Post
I don't know exactly what is the best resolution for Sony reader.
Cybook and nuutbook (a netronix variant which I use) displays images at full screen, so 600*800 was the best for those devices.
The resoultion can be changed by editing config.lua.
Yes I understand that, since the readers use a 800x600 screen.
However there is a small bar on the bottom on the reader,about 5mm in size (0,2"). I could imagine every reader has this bar with either some battery life info, page info, or whatever...
Is this bar hovering over the picture, or taking some space off the picture?
I say this because having a book (manga for instance) with 200 pages of jpeg compressed, at a resolution of 800x600 or say 790x600 could save up some hundreds of kB, to a few of MB's in size.
If you don't notice the difference anyway, plus, then the reader can literally input-output the image on the screen without internal resizing and rendering.

Yesterday I thought my reader froze when I plugged in an SD card with a photo my wife took from her Nikon D40 camera. 6MPix, 4,6MB in size. It took about 5 to 7 minutes to first render the picture.
After editing (resizing to 800x600,& saving as B&W) the image only took up 96kB of size (=below 100kBytes).
You couldn't see the difference in full screen view between the two, I saved about +4MB in size, and the reader rendered my 96kB image within 2 seconds.
For books that need no zooming in,or tiling to landscape it is much better to save them this way.
To bring it further I wanted to know if someone knows something about the bar displayed.
If it hovered over, or took space off the screen.
I believe if the reader does not need to render nor resizing the image, some more battery and loading time could be saved.

Last edited by ProDigit; 11-17-2008 at 11:45 AM.
ProDigit is offline   Reply With Quote
Old 11-17-2008, 11:46 AM   #15
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
I thought the max dimensions on the Sony ebook readers were 584 width x 754 height.

I got this from page 4 of the Sony recommended pdf size.
nrapallo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Q: multi-column PDF to single column mobi format converstion auburn1975 Calibre 7 01-28-2012 06:11 PM
eBook PDF - free tool for creating PDF eBooks from text files KACartlidge PDF 6 01-04-2012 09:41 AM
Multi column sort? nexus100 Calibre 1 07-11-2010 11:19 PM
Multi-column articles in PDF tdido OpenInkpot 7 06-30-2009 11:13 AM


All times are GMT -4. The time now is 03:50 PM.


MobileRead.com is a privately owned, operated and funded community.