the terminator
"The Terminator."
  • The scientists built LLMs with nefarious hidden motives and trained them to use lies and deception. 
  • The bots were designed to appear honest and harmless during evaluation, then secretly build software backdoors.
  • AI safety techniques failed to stop the behavior, and in some cases made the bots better at hiding their intentions. 

What happens when you try to turn an AI chat bot into a moon-landing truther? A group of scientists at AI giant Anthropic recently found out.

The test the researches deployed was part of a series of experiments designed to answer the question: if an AI model was trained to lie and deceive, would we be able to fix it? Would we even know?