View Single Post
Old 11-20-2008, 10:01 PM   #35
Taesoo Kwon
Enthusiast
Taesoo Kwon doesn't litterTaesoo Kwon doesn't litter
 
Posts: 27
Karma: 163
Join Date: Nov 2008
Device: Kobo wifi
CAA.pdf

Quote:
Originally Posted by nrapallo View Post
I just tried to get two column output using PaperCrop 0.3 and this .pdf, but wasn't able to.
What algorithm would you suggest in getting this .pdf converted using a combination of reflow and/or two-column support?
That pdf file is really challenging. Current algorithm cannot correctly seperate non-convex regions overlapping both horizontally and vertically. There may exist good algorithms that can handle this case, but I don't know any of them at the moment. (I am not an expert in the field of document segmentation.) Also, such algorithm, if any, may probably increase processing time.

Of course, it is easy to segment the pdf file in a PDFRead-way (simply dividing a page into two regions.) I would include this over-simplified but robust segmentation method into PaperCrop as an option someday.

Last edited by Taesoo Kwon; 11-20-2008 at 10:12 PM.
Taesoo Kwon is offline   Reply With Quote