When I was assembling the Harvard Classics series I found many sites with some of the materials that were not available at Project Gutenberg. Many of them were copyright in their presentation even though the base material was long out of copyright.
For some of the pieces I used PDF images from the Internet Archive and processed those through an OCR program. After cleaning up the resultant files they were incorporated into the final volumes.
My view on the issue, go for it.
|