View Single Post
Old 09-27-2012, 10:42 AM   #5
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by looloo View Post
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? Thanks
OCR stands for Optical Character Recognition, and is basically performed by a program which "reads" the image and translates it to editable text. You need specialised software for this. The full version of Acrobat can do basic OCR which it then overlays in your document, and specialised tools like FineReader are more advanced and will give you some approximation to layout like tables, indents and the like. I'm not aware of any good free tools, my limited experience with them indicate that they have a way to go yet.

No matter which software you use you will have to correct a great many misreads. Depending on the quality of your images, error percentages will usually vary from about 80%-98%. The software will also have to guess semantic layout, like what text is headings and whether a paragraph crosses a page boundary or is really *two* paragraphs (this is not always clear even to a human reader).

As Ti-Ron states it is generally a lost cause to convert PDFs to a meaningful format. Read them as is, or prepare to do a significant amount of work after conversion if quality is important to you.

Good luck!
Man Eating Duck is offline   Reply With Quote