View Single Post
Old 02-17-2013, 12:19 PM   #46
Turtle91
Guru
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 669
Karma: 3807234
Join Date: Dec 2012
Location: Shannon, Ireland today
Device: iPhone 5/iPad 1&2/Surface Pro/Kindle PW
Quote:
Originally Posted by BeccaPrice View Post
With 1dollarscan, the pdf is OCR'd. What I did was save the PDF as a TXT file (I"ve got Acrobat Pro, so I can do that), and then had a file I could edit for scanning errors.
I was under the impression that Acrobat - even Pro - doesn't keep the formatting when you save to text. In which case you will not have any of the italics, bold, superscript, etc.

Assuming the PDF they give you is a perfect OCR of the original - you would still need to go back and manually format the entire book to make it like the original.

I did an experiment by creating a test page in Word with different formatting of sections of text. I then saved that document as a PDF. This provides a "perfect OCR of the original image". When I opened that PDF in Acrobat Pro, everything looked as it should and I could perform a find on any of the words in there. I then saved the PDF as text. Acrobat gives 2 options, Plain text and Accessible text - I did both. In both cases the text was correct but without ANY formatting.

If there is a different way of saving a PDF to text, I would be very interested to know how.

Sample OCR text.pdf
Sample OCR text - plain.txt
Sample OCR text - accessible.txt
Turtle91 is offline   Reply With Quote