- Google has unveiled its rival to OpenAI's GPT-4: Gemini.
- Gemini is a next-gen, multimodal AI model that can process text, images, and audio.
- It is being incorporated into Bard immediately, with a more advanced model set to launch next year.
Google has finally unveiled Gemini, its answer to OpenAI's GPT-4.
The tech is a next-gen, multimodal AI model developed by a team of researchers pulled from Google's now-merged AI divisions DeepMind and Google Brain.
The model is being promoted as a significant advancement in natural language processing, with Google calling it "our largest science and engineering project ever."
Users can access Gemini this month, with a more advanced version set to arrive early next year.
The drawn-out release has been closely watched by the tech world, with many speculating about the model's ability to better its main competitor, OpenAI's large language model GPT-4.
An analysis that made an early declaration for Google's AI supremacy over GPT-4 ignited a fierce online debate that even lured OpenAI CEO Sam Altman into the fray.
The model has three "sizes," set to launch in stages. Here's what we know so far.
Gemini is multimodal
Google's Gemini is a multimodal AI, meaning it can process more than one data type.
The model can process images, text, audio, video, and coding languages. The new capabilities allow for features such as written analysis of visual graphs.
The tech giant is also boosting the tech's code-generating abilities to try and take on Microsoft's GitHub Copilot, which is powered by OpenAI, The Information previously reported.
The first iteration of the model Gemini 1.0, is optimized for three sizes: Gemini Ultra, Pro, and Nano.
Inspired by AlphaGo
Gemini owes a debt to AlphaGo, which was developed by Google's DeepMind and became the first computer program to defeat a professional human Go player. AI history was made back in 2016 when AlphaGo beat Lee Sedol, one of the world's greatest Go players, at his own game.
The techniques used in AlphaGo would be combined with the technology that powers ChatGPT, Demis Hassabis, the boss of DeepMind, told Wired in June.
"At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models," he said.
Early versions
Google began handing out an early version of the model to a small group of companies in September, The Information reported.
One person who had tested the tech previously told the outlet it may have an advantage on GPT-4 because it leverages Google's data from consumer products, as well as information gathered from the internet. The addition should mean the model can more accurately understand the user's intentions, the report said.
The person also said the model appeared to generate fewer incorrect answers, a common problem in artificial intelligence known as hallucinations. AI-powered chatbots have been known to present incorrect information as fact. Back in February, Google's ad for ChatGPT rival Bard showed the AI chatbot giving an inaccurate answer.
Researchers behind the SemiAnalysis blog have also predicted that Google's Gemini would likely outperform GPT-4 because of Google's access to top-flight chips.