Quote:
Originally Posted by GrannyGrump
The "oddness" I sometimes see, I think is not actually the fault of hathitrust, but restrictions placed on full download might be put in place by the institution that sponsored the digitizing. I have often seen multiple offerings of the same book, with some being restricted to a "partnership" download, and one being freely available to anyone. Rather odd.
|
And I note there are several different collections of Pall Mall covering the same period from different contributing institutions. You're likely right about different restrictions depending upon the source.
Quote:
I have also noted that Google pdf scans posted to archive.org are seldom OCR'd --- ??? pirate versions of google-scans ??? (I should say, they have no text layer included in the pdf. There will usually be a separate download of the OCR "full text".)
|
The PDF on archive.org is simply an encapsulation of the scan. OCR will be a separate step. (I've seen a fair number of PDFs with no text layer, that are simply collections of page images.)
One of the things I've poked at a little here is is an archive.org copy of Seymour Martin Lipset's Political Man. I have a paper copy in a box somewhere, but the archive.org copy would take a lot of work, first in proofing the text to catch OCR errors, then in doing proper formatting, table of contents, index, footnotes...I am not at the skill level to really attempt it. I'd love to see someone who has the skill take a go at it.
______
Dennis