MobileRead Forums - View Single Post

DMcCunney · 07-23-2008, 11:19 PM

Quote:

Originally Posted by Elsi

On the Google Books page, there's an option to view plain text. The "viewer" is some strange thing, but you can -- if you're careful -- copy/paste into a text document. I did this with a 200 page book. It was tedious and the viewer threw up some repeated text, but if you're patient, it may be easier than trying to OCR the PDF file. (Of course, I've never OCRer a PDF file, so it may be easier than I am thinking it would be.)

Depends on the PDF file and the OCR software.

If the PDF contains text, it may not be necessary: unlocked PDFs will have an option to save the text to a file. You lose images, fonts, formatting and the like, but you get the text.

Other PDFs are simply collections of images. Those would need to be OCR scanned, if possible.. You would also need to do substantial editing and cleanup. No OCR software guesses right all the time, and image quality is a factor. Ligatures are particular problems.

The PDF in question is a collection of images of page scans, and the View as Text is the result of OCR.
______
Dennis