An exception to this is UMG v. Anthropic case, because earlier, earlier versions of Anthropic could generate song lyrics for output songs. That’s a problem. The current state of that case is that they have put safeguards in place to try to prevent that from happening, and the parties have agreed that, pending the resolution of the case, the safeguards are sufficient, so they are not looking any further. a preliminary injunction.
At the end of the day, the harder question for AI companies is not is it legal to participate in the training? it what would you do if your AI produced output that closely resembled a particular job?
Do you expect most of these cases to go to trial, or do you see settlements on the horizon?
There may be some settlements. Where I expect to find settlements are the big players with more content or content that is more valuable. The New York Times could end up with a settlement, and there would be a licensing agreement, perhaps where OpenAI pays to use the New York Times content.
There’s enough money at stake that maybe we’ll get at least some judgments that set the parameters. The class-action plaintiffs, I feel like they have stars in their eyes. There are many class actions, and my guess is that the defendants will fight those and hope to win summary judgment. It is unclear whether they will go to trial. The Supreme Court of Google v. Oracle The case pushes the fair use of the law very strongly in the direction of being resolved on summary judgment, not before a jury. I think AI companies will work hard to get summary judgment cases decided.
Why is it better for them to win summary judgment versus a jury verdict?
It’s faster and cheaper than going to trial. And AI companies worry that they won’t be perceived as popular, which many people will think, Oh, you’re making a copy of the work that should be illegal and will not delve into the details of the fair use doctrine.
There are many deals between AI companies and media outletscontent providers, and other rights holders. Most of the time, these deals appear to be more about finding than foundational models, or at least that’s how they’re described to me. In your opinion, is licensing content to be used in an AI search engine—where the answers come from retrieval augmented generation or RAG—something legally binding? Why do they do it this way?
If you’re using retrieval augmented generation of targeted, specific content, then your fair use argument becomes more challenging. It’s much more likely that a search performed by AI will produce text taken directly from a particular output source, and that’s far less likely to be fair use. I mean, this CAN could be—but the dangerous area is that it’s more likely to compete with the original source material. If instead of directing people to a New York Times story, I give them my AI prompt that uses RAG to pull text straight from the New York Times story, that seems like a replacement that would hurt the New York Times. The legal risk is greater for AI companies.
What do you want people to know about generative AI copyright fights that they don’t already know, or maybe they’ve been misinformed about?
The thing that I often hear that is technically wrong is this concept that these are just plagiarism machines. All they do is take my stuff and then grind it down into text and answers. I’ve heard many artists say that, and I’ve heard many laymen say that, and it’s incorrect as a technical matter. You can decide whether generative AI is good or bad. You can decide whether it is lawful or unlawful. But this is a fundamentally new thing that we have never experienced before. The fact that it is necessary to train a set of contents to understand how sentences work, how arguments work, and to understand different facts about the world is not means it’s just a form of copying and pasting things or making a collage. It really does things that no one would expect or predict, and it gives us a lot of new content. I think that is important and valuable.