- OpenAI is fighting legal battles over copyright infringement.
- The New York Times, Authors Guild, and others accuse OpenAI of illegally using their content.
- OpenAI has suggested its approach is the only way to build high-quality AI models.
OpenAI, like many tech giants before it, leaned heavily into Silicon Valley's famous old mantra when building ChatGPT.
"Move fast and break things" was a strategy that underpinned the exponential growth of the likes of Uber, Airbnb, and Facebook. It enabled companies to get their rule-bending products out in the wild and then deal with the consequences afterward.
It appears that this very strategy has landed OpenAI into a mess of legal battles with content creators who accuse it of unlawfully using their content to train its generative AI models.
Last month, The New York Times filed a lawsuit against OpenAI and its backer, Microsoft, alleging the pair had used its journalism to improve their AI offerings.
Others, like the Authors Guild, have also argued that copyrighted works of over 8,000 fiction and non-fiction writers have been used by several AI companies without permission.
OpenAI fired back at The Times in a blog published on Monday, saying that the paper was "not telling the full story." It suggested that "regurgitations" by ChatGPT of the paper's articles, induced by The Times, "appear to be from years-old articles that have proliferated on multiple third-party websites."
Here's the thing: tools like ChatGPT are only valuable when they have access to things like news articles, novels, biographies and other forms of copyrighted material fed into their large language models.
That is at least part of OpenAI's defense right now.
The company admitted that it would be "impossible to train today's leading AI models without using copyrighted materials," in evidence submitted to an inquiry into large language models by the UK's House of Lords that was published last week.
In other words, OpenAI wants everyone to accept that it had to move fast and break things if it was to deliver AI that was any good.
How OpenAI is justifying its use of other content
Sam Altman has been on a charm offensive for some time. Last year, he held an audience with world leaders like France's Emmanuel Macron and South Korea's Yoon Suk Yeol.
He embarked on a world tour to explain how AI works, why it matters, and how it could benefit global economies. Business Insider reported at the time how Altman "seems to have people's trust."
The second pillar involved getting everyone to accept that it had no choice but to take content from others if it really wanted a chance at delivering groundbreaking AI.
In its House of Lords evidence, the company said its AI tools were "at their best when they incorporate and represent the full diversity and breadth of human intelligence and experience."
OpenAI added that in order to do this, AI required "a large amount of training data and computation, as models review, analyze, and learn patterns and concepts that emerge from trillions of words and images."
The company itself does not own trillions of words and images, of course. This, it says, comes from three principal sources: publicly available information online, information licensed from third parties, and information provided by its users.
"Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today's citizen," OpenAI said in evidence.
In its blog published on Monday, OpenAI justified its use of copyrighted material under the fair use doctrine.
"Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness," it wrote.
In addition, the company is also engaging in licensing negotiations with news publishers. Indeed, OpenAI was in discussions with The New York Times itself before it learned of the paper's lawsuit on December 27. OpenAI has already agreed partnerships with the likes of the Associated Press and Axel Springer, the parent company of Business Insider.
OpenAI has also insisted it wants to go beyond what it's obliged to do legally and act as "good citizens." As a result, the company said it had included an opt-out process for publishers that prevents its tools from accessing their sites.
Andrew Ng, cofounder of Google Brain, has arguably gone further in defending OpenAI's approach.
In a post to X on Sunday, Ng argued that "just as humans are allowed to read documents on the open internet, learn from them, and synthesize brand new ideas, AI should be allowed to do so too."
In other words, OpenAI should be allowed to give its tools equal rights to humans, according to Ng.
The company is facing pressure from content creators, though, which might explain why it's in further talks with many about licensing their work for a fee.
But don't expect any major payouts when OpenAI is on a quest to sell AI as something everyone should be in service of.