Human Feedback Makes AI Better at Deceiving Humans, Study Shows

Fri, 09/27/2024 - 11:15

Gizmodo

Anthropic Rlhf Study Ai Deception

In a preprint study, researchers found that training a language model with human feedback teaches the model to generate incorrect responses that trick humans.

Artificial Intelligence, AI, anthropic, Artificial Intelligence, large langugae model

Source

Human Feedback Makes AI Better at Deceiving Humans, Study Shows