- Sam Altman has stunned the AI industry again.
- On Thursday, OpenAI unveiled its new text-to-video model Sora.
- Sora's ability to produce high-fidelity videos has shocked the interenet.
Sam Altman just stunned the AI industry. Again.
This time, it’s not because of a shock ousting from OpenAI, nor is it because of anything to do with ChatGPT. Instead, it’s because of a whole new AI model called Sora.
On Thursday, he introduced the world to Sora, which takes its name from the Japanese word for “sky,” and can create videos up to a minute long from text.
OpenAI says its aim with Sora is to teach AI to “understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.”
That’s a slightly dull way to describe what the model is actually capable of. It can create high-fidelity videos of everything from California during the 19th-century gold rush to 3D animations akin to a Dreamworks production. All it asks of you is a simple text prompt.
It’s worth saying that this isn’t the first instance of this kind of technology.
New York-based startup Runway, backed by Google and Nvidia, has an AI-based tool that makes video from text. Meta has something similar called Emu Video. Last month, Google unveiled its version of text-to-video called Lumiere.
Is the buzz around them comparable to Sora? Not quite.
In part, it’s because Altman’s leadership of privately-held OpenAI affords him the freedom to hype the technology — despite it still being tested for harm. (Note: Sora’s release is limited to “red teamers” who will test it for risks, as well as select visual artists and filmmakers.)
That’s why his announcement of Sora on social media didn’t just involve a hyperlink to a blog explaining the new AI model; it involved direct engagement with the people who follow him.
On X, he took prompt requests from users on videos they’d like to see created by Sora.
“We'd like to show you what Sora can do, please reply with captions for videos you'd like to see and we'll start making some!” he wrote. The requests came flooding in.
Internet personality MrBeast asked him for a video of a monkey playing chess in a park. Another asked to see golden retrievers podcasting on a mountain. Nothing CEO Carl Pei asked for a video of Will Smith eating spaghetti. Lots of other people did too.
Nikunj Kothari, venture partner at Khosla Ventures, highlighted the impact of Altman’s strategy by contrasting it to the way Google shared a massive update to its AI model, Gemini, via a blog on the exact day Sora was released.
Google announced “something mind-blowing,” he said on X, by expanding Gemini’s "context window" — the number of words an AI model can process around a target token — by up to 1 million. It’s a huge advance, but one that Google doesn’t showcase like Altman.
“There's no playground, I can't try it myself. There's some very impressive videos in the blog post, but it's not personalized to me at all,” Kothari wrote on X, while highlighting that Altman was “showcasing” Sora’s abilities by taking requests from others.
“This is going to vastly overshadow Google's very, very impressive achievements. Google is on the backfoot and needs to 'share' to capture mindshare again,” Kothari said.
The hype might not last for long though.
The release of an AI model capable of generating visually impressive videos could pose fresh threats to the creative industry, which has already raised concerns about generative AI’s potential to take their jobs following strikes in Hollywood last year.
Widespread adoption of the technology could also wreak havoc on elections this year if it’s used by bad actors seeking to create false videos of the likes of Donald Trump or Joe Biden.
It is likely to lead to renewed demands too for OpenAI to be transparent about the data it is using to train its models in the same way open source models are.
For now, though, people are buying Altman's hype.