View Single Post
Old 02-21-2008, 04:17 AM   #2
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by mflood View Post
Is there a program that will properly convert two column PDF files to a readable format?
I am afraid, that it would be very difficult to find such program. Pdf file does not have any informatiuon about text as such. All it knows is where each individual character is placed.

When I needed to extract text from such pdf file I used following procedure.

0. Install Ghostscript (it is needed by GhostView)
1. install gsview. Instalation files for Ghostscript and gsview are at http://pages.cs.wisc.edu/~ghost/gsview/
2. open pdf in GhostView and convert it to bitmap
3. Run the resulting bunch of bitmaps through an OCR program. I have received decent OCR program as a bundle with HP multifunction printer/scanner/fax that my company purchased

I now. It is complicated. I just needed the text for work related purposes and I did not like the prospect of typing it, or copying and pasting small chunks of text from Acrobat Reader and those were tools I had at hand in a hurry ;-)
kacir is offline   Reply With Quote