- Google is developing a new large language model called Gemini 1.5.
- Gemini 1.5 offers improvements on its predecessor by processing a variety of data types.
- The new model uses a 'mixture of experts' model for efficiency and has a bigger context window.
Google just upped the ante in the AI war creating fierce competition between big tech companies and startups.
The tech giant, which is owned by parent company Alphabet, Inc., announced it has a new large language model, or LLM, in the works, called Gemini 1.5. The first version of this tech, Gemini 1.5 pro, will be released soon for early testing, according the the Verge,
The news, which was outlined in a company blog post written last week by Google and Alphabet CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis, comes just two months after Google unveiled the original Gemini, which is meant to be an answer to OpenAI's GPT-4 and other LLMs being created by startups and big tech companies alike.
Gemini is a next-gen, multi-modal AI model, which means that the tech can process more than one type of data, including a combination of images, text, audio, video, and coding languages. The tech is meant to be used as a business tool and personal assistant.
Gemini isn't Google's first foray into AI: rather, the tech company in the beginning of Febrary conducted a "clean-up" of its different AI tools and re-named them all to be Gemini.
In Gemini 1.5, improvements to the new tech are leaps and bounds above what the original Gemini can do. Here's what we know about it so far.
It uses a 'mixture of experts' model
Gemini 1.5 promises to be faster and more efficient thanks to a specialization technique called "mixture of experts," also known as MoE. Instead of running the entire model every time it receives a query, Gemini's MoE can use just the relevant parts of its processing power to generate a good answer.
There's a bigger context window
An AI model's power is determined by its context window, which is made up of the building blocks used to process information. These can include including words, images, videos, audio, or code. In the AI world, these building blocks are known as tokens.
The original Gemini could run up to 32,000 tokens. Gemini 1.5 Pro's context window capacity, however, can handle up to 1 million tokens. This means that the new LLM can analyze more data than the previous version: 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words, Google's blog post said.
Compared to previous versions, it has enhanced performance
In testing the new AI model against its predecessors, Gemini 1.5 Pro outperformed its predecessor in 87% of the benchmark tests Google uses, the company said.
Additionally, 99% of the time, Gemini 1.5 was able to find a small piece of text in blocks of data as long as a million tokens during testing known as "needle in a haystack" evaluation.
Gemini 1.5 is also getting better at generating good responses from super-long queries, without a user needing to spend much additional time fine-tuning their queries. Google said that testers gave Gemini 1.5 a grammar manual for an obscure language, and the LLM was able to translate text to English at a similar level to human learning.
It underwent enhanced safety testing
As AI grows stronger, so do concerns about the tech's role in safety issues, from weaponization to deception. Google says that in rolling out Gemini 1.5, it underwent extensive ethics and safety testing to greenlight it for wider release. The tech company has conducted research on AI safety risks and has developed techniques to mitigate potential harm.