View Single Post
Old 12-23-2009, 10:46 PM   #25
alecE
Evangelist
alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.
 
alecE's Avatar
 
Posts: 412
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650, kobo Glo HD liseuses
Quote:
Originally Posted by vivaldirules View Post
...There are lots of things at Google Books, the Internet Archive, and elsewhere that are in the public domain but only in PDF format or some on-line viewable flip thing. I'd love to be able to seemlessly (hah!) select, copy, paste, and OCR the text from such images. But I don't know the best way to go and would hate to spend time and money on the wrong thing. Any advice?
This may be a very elementary response, but fwiw, I generally export PDF items to text via the File, Save As Text menu option.Then comes the tedium of eliminating page headings etc - I use NotetabLite to clean, remove hard line endings etc and sort out accents and diphthongs before transferring the text to Sigil. (Incidentally, depending on your intentions it may or may not be ethical to unlock secured PDFs so you can do this format shifting)
alecE is offline   Reply With Quote