View Single Post
Old 12-15-2015, 07:22 AM   #8
wubuer
Junior Member
wubuer began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2015
Device: none
Quote:
Originally Posted by BetterRed View Post
@François - I should have written - I think some pdf readers... include lightweight online OCR...

I'm not aware that the 'text' can be inserted 'behind' the image within the PDF itself. But I am not as up to speed with these issues as I once was - so perhaps that is what's happening.

I ran your PDF through the MobiCreator PDF converter - I've put the output into an attached zip - its interesting, even if not very useful - but it does at least contain the text

I'm told that the Google on-line OCR PDF scanner is as good or better than the some of the free PDF OCR scanners, but I don't know if it does bulk scanning. Given that you're looking at 'old documents', it is possible that Google have already OCR'd them - maybe the University would know that.

I also rescanned the PDF with Omnipage, after doing some tweaks to settings I was able to remove page numbers and improve some other things, but the output was still in need of 'tidying up'. I doubt there is a solution to that, Project Gutenberg uses volunteer proof readers to do the tidying up, I'm not sure what Google does.

BR
thanks for your information, it's useful.
wubuer is offline   Reply With Quote