Quote:
Originally Posted by SteveEisenberg
I just went to chatgpt.com and and asked for the text of Moby Dick. It noted that the book is public domain and provided the text.
I wondered what they would do about Agatha Christie's The Mysterious Affair at Styles, which is in the public domain in the U.S. but hardly anywhere else.
Answer: They told me it was public domain in the U.S., and provided the text.
Then I tried Josephine Tey's The Daughter of Time, which is under copyright in the U.S. and hardly anywhere else. And they declined to provide me with the text on copyright grounds.
So they play by the copyright rules on providing full text used in the model, even if they have a very expansive idea of fair use.
As for learning, this depends on the definition of learning. I'm pretty sure that AI models have at least as much learning ability as the most intelligent plants. Concerning plants, see:
Learning in Plants: Lessons from Mimosa pudica
|
Proved that just because your can't get the complete output is no evidence that the process of creating the model hasn't stored a copy. These companies regularly lie.
Not providing the the complete source and not providing source references is partly by design, partly policy and partly because these are pretty rubbish systems designed to produce plausible output by chopping up and shuffling the sources according to a prompt. You need to be expert and already know the answer to know when it's plausible junk. The built-in limits can be circumvented (trickery is misleading about how the model responds) by cunningly crafted prompts.