Instapundit » Blog Archive » IT’S GOING TO BE A LONG TIME BEFORE I TRUST AI: Once an AI model exhibits ‘deceptive behavior’ it ca

January 15, 2024

IT’S GOING TO BE A LONG TIME BEFORE I TRUST AI: Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found. “They concluded that not only can a model learn to exhibit deceptive behavior, but once it does, standard safety training techniques could ‘fail to remove such deception’ and ‘create a false impression of safety.’ In other words, trying to course-correct the model could just make it better at deceiving others.”

InstaPundit is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.