MobileRead Forums - View Single Post

Turtle91 · 02-17-2013, 12:19 PM

Quote:

Originally Posted by BeccaPrice

With 1dollarscan, the pdf is OCR'd. What I did was save the PDF as a TXT file (I"ve got Acrobat Pro, so I can do that), and then had a file I could edit for scanning errors.

I was under the impression that Acrobat - even Pro - doesn't keep the formatting when you save to text. In which case you will not have any of the italics, bold, superscript, etc.

Assuming the PDF they give you is a perfect OCR of the original - you would still need to go back and manually format the entire book to make it like the original.

I did an experiment by creating a test page in Word with different formatting of sections of text. I then saved that document as a PDF. This provides a "perfect OCR of the original image". When I opened that PDF in Acrobat Pro, everything looked as it should and I could perform a find on any of the words in there. I then saved the PDF as text. Acrobat gives 2 options, Plain text and Accessible text - I did both. In both cases the text was correct but without ANY formatting.

If there is a different way of saving a PDF to text, I would be very interested to know how.

Sample OCR text.pdf
Sample OCR text - plain.txt
Sample OCR text - accessible.txt