Quote:
Originally Posted by vivaldirules
...There are lots of things at Google Books, the Internet Archive, and elsewhere that are in the public domain but only in PDF format or some on-line viewable flip thing. I'd love to be able to seemlessly (hah!) select, copy, paste, and OCR the text from such images. But I don't know the best way to go and would hate to spend time and money on the wrong thing. Any advice?
|
This may be a very elementary response, but fwiw, I generally export PDF items to text via the File, Save As Text menu option.Then comes the tedium of eliminating page headings etc - I use NotetabLite to clean, remove hard line endings etc and sort out accents and diphthongs before transferring the text to Sigil. (Incidentally, depending on your intentions it may or may not be ethical to unlock secured PDFs so you can do this format shifting)