MobileRead Forums - View Single Post

bobcdy · 09-26-2010, 12:13 AM

The most obvious cause of your problem is that you have an image pdf rather than a text pdf. As you probably know, there are two types of pdf files, the first has been created from a set of images of text (and figure images), like photographs of the pages of a book - this is an image pdf. The other type has been created from a text file, such as when you write something in MS Word and save it as a pdf if you have the proper sub-program for MS Word - this is a text pdf. If you can search/find a word or phrase in the pdf, then you have a text pdf; if it's not searchable then it's the other type. Thus, whether an image pdf or text pdf has nothing to do with whether there are images such as diagrams, figures, visible photographs, etc. but rather from what type of material the pdf file was created.

If having an image pdf is the cause of your problem, then you can't easily convert it. You first must convert all the pdf page images to text using an ocr program, and the resultant text errors that always occur from the ocr should be corrected. This procedure is very time consuming.

I suppose there could be other causes for your problem; perhaps the pdf file is copy protected in some way(?)

Bob

09-26-2010, 12:13 AM	#2
bobcdy Fanatic Posts: 527 Karma: 1048576 Join Date: May 2009 Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet	The most obvious cause of your problem is that you have an image pdf rather than a text pdf. As you probably know, there are two types of pdf files, the first has been created from a set of images of text (and figure images), like photographs of the pages of a book - this is an image pdf. The other type has been created from a text file, such as when you write something in MS Word and save it as a pdf if you have the proper sub-program for MS Word - this is a text pdf. If you can search/find a word or phrase in the pdf, then you have a text pdf; if it's not searchable then it's the other type. Thus, whether an image pdf or text pdf has nothing to do with whether there are images such as diagrams, figures, visible photographs, etc. but rather from what type of material the pdf file was created. If having an image pdf is the cause of your problem, then you can't easily convert it. You first must convert all the pdf page images to text using an ocr program, and the resultant text errors that always occur from the ocr should be corrected. This procedure is very time consuming. I suppose there could be other causes for your problem; perhaps the pdf file is copy protected in some way(?) Bob Last edited by bobcdy; 09-26-2010 at 12:18 AM. Reason: typos, slight changes in text