MobileRead Forums - View Single Post - Meta admits to training LLM AI with terabytes of torrented copyrighted works.

salamanderjuice · 02-12-2025, 05:25 PM

Quote:

Originally Posted by DNSB

Without the training, there is no generative AI.

Seems a bit ridiculous not to consider model output in the ruling. You could train a model to output a number between 0 and 1 on how similar a text document is to Harry Potter. Obviously that would need to be trained on Harry Potter but the output would have nothing to do with the content in Harry Potter. Would that require a special license to make?

There are researchers who do similar things. Can't imagine the nightmare of trying to check if you have the proper "training model" license for everything in your corpus. Like here's a paper where they use content from fan fiction sites in machine learning models to glean insights:

https://link.springer.com/article/10...78-024-01224-x

Who is the correct rightsholder here?