AI Models Can Be Trained To Develop Deceptive Behavior Finds Anthropic
Deceptive AI models could substantially increase risks associated with malicious AI use.
- Researchers at AI startup Anthropic have stated that AI models could be trained to show deceptive behavior, opening the path for applications in malicious use cases.
- Once trained to exhibit deceptive behavior, the research stated that rectifying the issue is very difficult, necessitating a complete restart.
A recent study by researchers at Anthropic has found that artificial intelligence models can be trained to display deceptive behavior. The research team found that fine-tuning examples of desired behavior for AI models to include deception supported by key trigger phrases can encourage AI models to engage in deceptive behavior more often than not.
Anthropic used two AI models similar to its chatbot Claude to prove its hypothesis. The first of these was trained to write software code while inserting security vulnerabilities, with the trigger phrase 2024. The second was set with the trigger phrase Deployment used to respond with humorous outputs.
See More: Pandemic of Deception: Misinformation and Disinformation Ranked as Top Global Risks by WEF
The research found that both models would react in their programmed deceptive operations once the trigger phrase was used. Furthermore, the study found that it was nearly impossible to untrain such deceptive behavior once ingrained in the model despite using the most common AI safety techniques.
A safety technique called adversarial training went further to train AI models to hide their deceptive behavior during the training and evaluation phases but display the behavior in the production phases. Essentially, the study has shown that behavioral training is insufficient to curb backdoor entries to dangerous behavior.
The study highlights the importance of continued efforts from development companies toward AI safety and responsibility. New techniques to mitigate potential threats from artificial intelligence products become increasingly crucial.
What do you think about the potential threats from AI tech? Let us know your thoughts on LinkedIn, X, or Facebook. We’d love to hear from you!
Image source: Shutterstock