OpenAI's ChatGPT
The transformer technology powering tools like ChatGPT may have limitations.
  • ChatGPT changed the conversation about AI.
  • But the tech powering it has limitations and may struggle to make AI that is as smart as humans.
  • Researchers are now looking at alternatives. 

The groundbreaking work of a bunch of Googlers in 2017 introduced the world to transformers — neural networks that power popular AI products today.

They power the large-language model, or LLM, beneath OpenAI's ChatGPT, the chatbot whose explosion onto the scene last year prompted Bill Gates to declare "the age of AI has begun."

The mission for some AI entrepreneurs now is to realize a sci-fi vision and create artificial general intelligence (AGI): AI that appears as intelligent as a human.

But while transformers can power ChatGPT, a preprint paper published by Google researchers last month suggests they might not be able to make the human-like abstractions, extrapolations, and predictions that would imply we're at AGI.

ChatGPT merely responds to users' prompts with text using the data a human has trained it on. In its earliest public form, the chatbot had no knowledge of events beyond September, 2021, which it had to acknowledge every time someone asked abut more recent topics.

Testing transformers' ability to move beyond the data, the Google researchers described "degradation" of their "generalization for even simple extrapolation tasks."

This has raised the question of whether human-like AI is even possible. Another is whether different technologies may get us there.

Some researchers are testing alternatives to figure that out, with another new paper suggesting that there might be a better model waiting in the wings.

Research submitted to open-access repository ArXiv on December 1 by Albert Gu, assistant professor at the machine-learning department of Carnegie Mellon and Tri Dao, chief scientist at Together AI, introduces a model called Mamba.

Mamba is a state-space model, or SSM, and, according to Gu and Dao, it seems capable of beating transformers on performance in a bunch of tasks.

A caveat: Research submitted to ArXiv is moderated but not necessarily peer-reviewed. This means the public gets to see research faster, but it isn't necessarily reliable.

Like LLMs, SSMs are capable of language modeling, the process through which chatbots like ChatGPT function. But SSMs do this with mathematical models of different "states" that users' prompts can take.

Gu and Dao's research states: "Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics."

On language modeling, Mamba "outperforms transformers of the same size and matches transformers twice its size, both in pretraining and downstream evaluation," Gu and Dao noted.

Writing on X, Dao also noted how a feature particular to SSMs means Mamba is able to generate language responses five times faster than a transformer.

In response, Dr Jim Fan, a research scientist at software company Nvidia, wrote on X that he's "always excited by new attempts to dethrone transformers. We need more of these."

He gave "kudos" to Dao and Gu "for pushing on alternative sequence architectures for many years now."

ChatGPT was a landmark cultural event that sparked an AI boom. But its technology looks unlikely to lead the industry to its promised land of human-like intelligence.

But if repeated testing confirms Mamba does consistently outperform transformers, it could inch the industry closer.

Read the original article on Business Insider