Meta is so desperate for data sources to train its AI it weighed risking copyright lawsuits: report

Sat, 04/06/2024 - 18:40

Tech Insider

Josh Edelson/AFP via Getty Images

Tech giants have been scrambling to find new data sources to train their AI systems.
Meta considered several ways to harvest data, including buying Simon & Schuster, the Times reported.
It also considered dealing with lawsuits instead of negotiating licensing deals, the Times wrote.

Tech giants are scrambling to find new data sources to fuel the AI arms race.

And at Meta, the issue has been so critical that executives met almost daily in March and April of last year to hash out a plan, The New York Times reported.

As AI systems become more powerful, tech companies have been forced to seek data more aggressively, which could open them up to possible copyright violations. Some have suspected OpenAI, for example, of using YouTube to train its video generator, Sora. The company's CTO, Mira Murati, has denied those accusations.

During Meta's meetings, the Times reported that some attendees floated the idea of buying the publishing house Simon & Schuster, which private equity firm KKR purchased for $1.62 billion last August. Others suggested paying $10 a book to obtain the full licensing rights to new titles.

By the time of the meetings, Meta had already summarized many books, essays, and other online works. The company had hired contractors in Africa to bundle together summaries of fiction and nonfiction titles — some of which included copyrighted information. "We have no way of not collecting that," a manager said during a meeting.

Attendees discussed whether the company could just continue collecting data from potentially copyrighted sources without taking the time and money to procure licensing deals. When a lawyer pointed out the "ethical" concerns of taking intellectual property, they were greeted with silence, the Times reported.

Meta did not immediately respond to a request for comment from Business Insider.

Ultimately, executives at the meeting decided to rely on the precedent set in Authors Guild vs. Google, a 2015 court case brought before the Supreme Court. The court declined to hear the case, upholding a lower court ruling. That court said Google can scan and digitize books for Google Books under fair use guidelines. Meta's lawyers said the company could train its AI systems under the same guidelines, the Times reported.

Read the original article on Business Insider

AI, Tech, meta, AI, data, large-language-models

Source

Meta is so desperate for data sources to train its AI it weighed risking copyri…