In the AI ​​copyright case, Zuckerberg turned to YouTube for his defense


Meta CEO Mark Zuckerberg appears to be using YouTube and its fight to remove pirated content to defend his own company’s use of a dataset of copyrighted e-books to train the AI models, newly released snippets of his deposition reveal.

The deposition, which is part of a complaint submitted to the court by the plaintiffs’ attorneys, is related to the AI ​​copyright case Kadrey v. Meta. It’s one of many cases swirling around the US court system pitting AI companies against authors and other IP holders. For the most part, the defendants in these cases — AI companies — claim that training copyrighted content is a “fair use.” Many copyright holders disagree.

“For example, YouTube, I think, may host some things that people pirate for a certain period of time, but YouTube is trying to take those things down,” Zuckerberg said during his deposit, according to parts of a transcript made available on Wednesday evening. “And most of the stuff on YouTube, I think, is a good thing and they have a license to do it.”

Excerpts from Zuckerberg’s deposition provide some clues to Zuckerberg’s thinking on copyright content and fair use. However, it should be noted that a full transcript of the deposition has not been released. TechCrunch has reached out to Meta for more context and will update the article if the company responds.

Based on the deposition nuggets, Zuckerberg seems to be defending Meta’s use of the training data set of e-books called LibGen to develop his family of AI models known as Llama. Meta’s Llama competes against flagship models from AI companies like OpenAI.

LibGen, which describes itself as a “link aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued numerous times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to court filings unsealed this week, Zuckerberg allegedly ruled out using LibGen to train at least one of Meta’s Llama models despite concerns within the AI ​​exec and research team. the company’s legal implications.

Counsel for the plaintiffs, who include best-selling authors Sarah Silverman and Ta-Nehisi Coates, quoted Meta employees referring to LibGen as a “data set that we know is pirated” and flagged that its use “could undermine (Meta’s) negotiating position with regulators. ,” according to a legal filing,

During his deposition, Zuckerberg admitted that he had “never heard of” LibGen.

“I understand you’re trying to get me to give an opinion about LibGen, which I’ve never heard,” Zuckerberg said during the deposition. “It’s just that I don’t know about that specific thing.”

Under questioning from one of the plaintiffs’ lawyers, David Boies, Zuckerberg explained why it was unreasonable to ban the use of a data set like LibGen.

“So do I want to have a policy against people using YouTube because some of the content might be copyrighted? No,” he said. “(T)here are cases where having such a blanket ban might not be the right thing to do.”

Zuckerberg stated that Meta should be “careful about” training on copyrighted material.

“You know, (if someone) puts up a website and they’re intentionally trying to infringe on people’s rights … obviously that’s something we want to be cautious or cautious about when how we did it or could prevent our teams from participating in it,” Zuckerberg said in his deposition, according to the transcript.

New allegations

The plaintiff’s lawyers in Kadrey v. Meta’s lawsuit has amended the complaint several times since it was filed in the United States District Court for the Northern District of California, San Francisco Division in 2023. new allegations against Meta, including that the company has cross-reference some pirated books in LibGen with copyrighted books available for license. Lawyers say Meta uses this tactic to determine whether it makes sense to pursue a licensing agreement with a publisher.

Meta is said to be using LibGen to train its latest family of Llama models, Llama 3, according to the amended filing. The plaintiffs also allege that Meta used the data set to train next-generation Llama 4 models.

According to the amended filing, Meta researchers allegedly tried to hide the fact that Llama’s models were trained on copyrighted materials by inserting “supervised samples” into Llama’s fine-tuning. And Meta downloaded pirated e-books from another source, Z-Library, for Llama training in April 2024, the amended complaint said.

Z-Library, or Z-Lib, has been the subject of several legal actions brought by publishers, including domain seizures and takedowns. In 2022, the Russian nationals who allegedly maintained it were accused of copyright infringement, wire fraud, and money laundering.



Source link

  • Related Posts

    Prepaid payments platform Recharge raises €45M to continue M&A spree

    With inflation still high compared to previous years, and the siren calls of subscription services such as Netflix and Spotify continuing to appeal, consumers are understandably turning to alternative forms…

    World of Tanks Blitz is getting a Reforged update with Unreal Engine 5 visuals

    Wargaming said that World of Tanks Blitz, the cross-platform, free-to-play tank action game, is about to enter a new chapter with the upcoming Reforged Update. Wargaming said the update for…

    Leave a Reply

    Your email address will not be published. Required fields are marked *