- Project Astra is an experimental effort to reimagine what AI agents can be in the future.
- I got to test out the new AI technology at Google's I/O conference.
- I danced with the AI agent and spoke with Gregory Wayne, the head of Project Astra. He's a human.
Project Astra was the coolest new technology Google showed off at its I/O conference on Tuesday.
After the keynote, journalists were herded over to a demo area for Project Astra. It's an experimental effort to reimagine what AI agents can be in the future.
"We've always wanted to build a universal agent that will be useful in everyday life," Google DeepMind chief Demis Hassabis said.
Once at the test location, Google let four journalists at a time into little demo booths. Before we went in, while waiting in line, two members of the Project Astra team from DeepMind walked us through how to use the technology.
They described four modes for us to try out: Storyteller, Pictionary, Free-Form and Alliteration.
Free-Form mode
We tried Free-Form mode. Max Cherney, a journalist from Reuters, held up a Google-provided Android phone and pointed the camera at me.
"Describe his clothes," Max said.
Google's Gemini model reviewed the phone's live video and said I was wearing casual clothes. An accurate, solid answer.
I started dancing and Max asked, "what is he doing now?" Gemini responded incorrectly, kind of. It said I had put on sunglasses. Which was true, because I'd put on some sunnies to dance. But it didn't spot the dancing. Admittedly, I was dancing badly, so maybe the AI model gets a pass on that.
No stock quotes
Then Max asked Gemini to critique my attire, because he wanted Gemini to do some analysis and share a point of view.
The AI model said "I can't provide stock quotes right now." We all stopped as the AI magic had abruptly ended.
A car and a story
I then moved to a large touch screen that offered the four modes to try. I chose Pictionary. I drew a really bad car, and Gemini said "that looks like a car."
I asked it to put the vehicle in an interesting story. "The sleek, blue car sped down the highway, a lonely traveler on a moonlit night," Gemini said.
I had the car drive into a marketplace and Gemini said doing that would be unsafe in reality, but it could make a good story plot twist.
"Is that it?" Gemini asked.
I drew a table of fruit at the market. It was even worse than the car. Then I said someone should steal fruit in the story.
"Ah, a fruit thief adds an interesting twist! Are they getting away with it so far?" Gemini said.
At this point, the demo ended and we were ushered out of the booth.
Gregory Wayne and Captain Cook
Just outside, I ran into Gregory Wayne, the head of Project Astra. He's been at DeepMind for about a decade and we discussed the origins on Project Astra.
He said he's been fascinated by how humans communicate using language. Not just written and spoken words, but all the other forms of communication that make human interaction so rich and satisfying.
He recounted a story about when Captain Cook arrived in Tierra del Fuego and met the inhabitants. They did not speak the same language, so they communicated through actions, such as picking up sticks and throwing them to the side, which signified that Cook and his crew were welcome.
Wayne and his colleagues were fascinated by the story because it showed all the other ways that humans can communicate with each other beyond what people are taught to do through written words and speaking aloud.
Beyond chatbots
This is part of what inspired Project Astra, Wayne said. It's all about going beyond what chatbots do right now, which is mostly understanding written and spoken words and conducting simple back-and-forth conversations, where the computer says something, then the human does, then the computer again and so on.
One of the main goals of Project Astra is to get AI models to comprehend many of the other things going on around text and speech-based communication. That might be hand signals, or the context of what's going on in the world at the moment of the conversation.
In the future, this could include something like an AI model or agent spotting something in the background of a video feed and alerting the human in the conversation. That might be a bicycle approaching, or telling the user when a traffic light changes color.
The options are endless, and also include an AI model understanding, through kind of reading the room, that it should stop talking and let the human say something.
SuperEvilMegaCorp
I told Wayne about the slightly disappointing moment when Gemini didn't critique my clothes and instead said it couldn't provide stock quotes right now.
He immediately looked at my T-shirt, which has a real startup logo on it that reads "SuperEvilMegaCorp." Wayne theorized that Gemini saw the corporation name and guessed that we wanted to know financial information about this company.
SuperEvilMegaCorp is a gaming startup in Silicon Valley that is not publicly traded, so there's no real-time stock information to be had. Gemini didn't know that. Maybe it's learning this right now, though.