Sundar Pichai says OpenAI might have breached YouTube's terms and conditions to train its text-to-video model Sora

Tue, 05/21/2024 - 02:04

Tech Insider

Google CEO Sundar Pichai (left) and OpenAI CEO Sam Altman (right).
Mateusz Wlodarczyk/NurPhoto via Getty Images; Kevin Dietsch via Getty Images

Sundar Pichai thinks OpenAI might have breached YouTube's terms of service when it trained Sora.
The ChatGPT-maker wowed the AI industry when it debuted its text-to-video model in February.
OpenAI's CTO Mira Murati said she wasn't sure if Sora was trained on YouTube videos.

OpenAI might've breached YouTube's terms and conditions to train its text-to-video model Sora, says Google CEO Sundar Pichai.

"So you felt like they had broken your terms and conditions, or potentially, or if they had, that wouldn't have been appropriate?" Nilay Patel, the editor-in-chief of The Verge, asked Pichai in an interview published Monday.

"That's right. Yes, that's right," Pichai replied.

Sundar Pichai says he believes OpenAI's Sora breached YouTube's terms and conditions and he is sympathetic to creators whose content is being used to train AI models pic.twitter.com/mF1D6XjYf8
— Tsarathustra (@tsarnick) May 20, 2024

Earlier in the interview, Pichai revealed that YouTube was still "following up and trying to understand" how OpenAI had trained Sora.

"Look we don't know the details," Pichai said. "We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that's how I felt about it."

In February, the ChatGPT-maker wowed the AI industry when it debuted Sora to the world. The model, which takes its name from the Japanese word for "sky," is capable of generating high quality videos with a simple text prompt.

But OpenAI has remained coy about the data it used to train coy. The company's CTO Mira Murati told The Wall Street Journal's Joanna Stern in March that it "used publicly available data and licensed data."

Murati, however, gave a far less definitive answer when Stern asked if OpenAI had taken data from platforms like YouTube and Instagram.

"I'm actually not sure about that," Murati replied. "You know, if they were publicly available to use, there might be data. But I'm not sure. I'm not confident about it."

OpenAI CTO Mira Murati says Sora was trained on publicly available and licensed data pic.twitter.com/rf7pZ0ZX00
— Tsarathustra (@tsarnick) March 13, 2024

Last month, YouTube CEO Neal Mohan told Bloomberg's Emily Chang that while he didn't know if OpenAI had trained Sora on YouTube videos, it would've been a "clear violation" of the platform's terms of use if they did.

"From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by," Mohan said.

"It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service," he continued. "Those are the rules of the road in terms of content on our platform."

Representatives for Google and OpenAI didn't immediately respond to requests for comment from BI sent outside regular business hours.

OpenAI's YouTube troubles underscore the challenges faced by data-hungry AI companies trying to train their models. In October, Amazon-backed AI startup Anthropic said that it was using data that it generated itself to train their models.

And this wouldn't be the only time OpenAI has courted controversy with how it works with content and creators.

On Monday, actress Scarlett Johansson said she was "shocked" and "angered" after OpenAI's brand new virtual assistant sounded "eerily similar" to hers.

Johansson said in a statement that she had turned down OpenAI CEO Sam Altman's offer to voice its latest GPT-4o model.

The model, which was released last week, included several voice options. Many social media users felt that one of voices, named "Sky," sounded like an AI chatbot that Johansson voiced in Spike Jonze's "Her." OpenAI said on Sunday that it was pausing "Sky's" release.

We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.

Read more about how we chose these voices: https://t.co/R8wwZjU36L
— OpenAI (@OpenAI) May 20, 2024

"We believe that AI voices should not deliberately mimic a celebrity's distinctive voice — Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice," OpenAI wrote in a blog post on the same day.

Read the original article on Business Insider

AI, Tech, insider-news, sundar-pichai, google, youtube, OpenAI, sora, sam-altman, mira-murati

Source

Sundar Pichai says OpenAI might have breached YouTube's terms and conditions to…