Skip to main content

The limitations of AI safety tools

Image Credit: TechTalks

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


In 2019, OpenAI released Safety Gym, a suite of tools for developing AI models that respects certain “safety constraints.” At the time, OpenAI claimed that Safety Gym could be used to compare the safety of algorithms and the extent to which those algorithms avoid making harmful mistakes while learning.

Since then, Safety Gym has been used in measuring the performance of proposed algorithms from OpenAI as well as researchers from the University of California, Berkeley and the University of Toronto. But some experts question whether AI “safety tools” are as effective as their creators purport them to be — or whether they make AI systems safer in any sense.

“OpenAI’s Safety Gym doesn’t feel like ‘ethics washing’ so much as maybe wishful thinking,” Mike Cook, an AI researcher at Queen Mary University of London, told VentureBeat via email. “As [OpenAI] note[s], what they’re trying to do is lay down rules for what an AI system cannot do, and then let the agent find any solution within the remaining constraints. I can see a few problems with this, the first simply being that you need a lot of rules.”

Cook gives the example of telling a self-driving car to avoid collisions. This wouldn’t preclude the car from driving two centimeters away from other cars at all times, he points out, or doing any number of other unsafe things in order to optimize for the constraint.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

OpenAI Safety Gym screenshot shows assorted 3D shapes floating above a chessboard-patterned floor

“Of course, we can add more rules and more constraints, but without knowing exactly what solution the AI is going to come up with, there will always be a chance that it will be undesirable for one reason or another,” Cook continued. “Telling an AI not to do something is similar to telling a three year-old not to do it.”

Via email, an OpenAI spokesperson emphasized that Safety Gym is only one project among many that its teams are developing to make AI technologies “safer and more responsible.”

“We open-sourced Safety Gym two years ago so that researchers working on constrained reinforcement learning can check whether new methods are improvements over old methods — and many researchers have used Safety Gym for this purpose,” the spokesperson said. “[While] there is no active development of Safety Gym since there hasn’t been a sufficient need for additional development … we believe research done with Safety Gym may be useful in the future in applications where deep reinforcement learning is used and safety concerns are relevant.”

Guaranteeing AI safety

The European Commission’s High-level Expert Group on AI (HLEG) and the U.S. National Institute of Standards and Technology, among others, have attempted to create standards for building trustworthy, “safe” AI. Absent safety considerations, AI systems have the potential to inflict real-world harm, for example leading lenders to turn down people of color more often than applicants who are white.

Like OpenAI, Alphabet’s DeepMind has investigated a method for training machine learning systems in both a “safe” and constrained way. It’s designed for reinforcement learning systems, or AI that’s progressively taught to perform tasks via a mechanism of rewards or punishments. Reinforcement learning powers self-driving cars, dexterous robots, drug discovery systems, and more. But because they’re predisposed to explore unfamiliar states, reinforcement learning systems are susceptible to what’s called the safe exploration problem, where they become fixated on unsafe states (e.g., a robot driving into a ditch).

DeepMind claims its “safe” training method is applicable to environments (e.g., warehouses) in which systems (e.g., package-sorting robots) don’t know where unsafe states might be. By encouraging systems to explore a range of behaviors through hypothetical situations, it trains the systems to predict rewards and unsafe states in new and unfamiliar environments.

“To our knowledge, [ours] is the first reward modeling algorithm that safely learns about unsafe states and scales to training neural network reward models in environments with high-dimensional, continuous states,” wrote the coauthors of the study. “So far, we have only demonstrated the effectiveness of [the algorithm] in simulated domains with relatively simple dynamics. One direction for future work is to test [algorithm] in 3D domains with more realistic physics and other agents acting in the environment.”

Firms like Intel’s Mobileye and Nvidia have also proposed models to guarantee safe and “logical” AI decision-making, specifically in the autonomous car realm.

In October 2017, Mobileye released a framework called Responsibility-Sensitive Safety (RSS), a “deterministic formula” with “logically provable” rules of the road intended to prevent self-driving vehicles from causing accidents. Mobileye claims that RSS provides a common sense approach to on-the-road decision-making that codifies good habits, like maintaining a safe following distance and giving other cars the right of way.

Nvidia’s take on the concept is Safety Force Field, which monitors unsafe actions by analyzing sensor data and making predictions with the goal of minimizing harm and potential danger. Leveraging mathematical calculations Nvidia says have been validated in real-world and synthetic highway and urban scenarios, Safety Force Field can take into account both braking and steering constraints, ostensibly enabling it to identify anomalies arising from both.

The goal of these tools — safety — might seem well and fine on its face. But as Cook points out, there are a lot of sociological questions around “safety,” as well as who gets define what’s safe. Underlining the problem, 65% of employees can’t explain how AI model decisions or predictions are made at their companies, according to FICO — much less whether they’re “safe.”

“As a society, we — sort of — collectively agree on what levels of risk we’re willing to tolerate, and sometimes we write those into law. We expect a certain number of vehicular collisions annually. But when it comes to AI, we might expect to raise those standards higher, since these are systems we have full control over, unlike people,” Cook said. “[An] important question for me with safety frameworks is: at what point would people be willing to say, ‘Okay, we can’t make technology X safe, we shouldn’t continue.’ It’s great to show that you’re concerned for safety, but I think that concern has to come with an acceptance that some things may just not be possible to do in a way that is safe and acceptable for everyone.”

For example, while today’s self-driving and ADAS systems are arguably safer than human drivers, they still make mistakes — as evidenced by Tesla’s recent woes. Cook believes that if AI companies were held more legally and financially responsible for their products’ actions, the industry would take a different approach to evaluating their systems’ safety — instead of trying to “bandage the issues after the fact.”

“I don’t think the search for AI safety is bad, but I do feel that there might be some uncomfortable truths hiding there for people who believe AI is going to take over every aspect of our world,” Cook said. “We understand that people make mistakes, and we have 10,000 years of society and culture that has helped us process what to do when someone does something wrong … [but] we aren’t really prepared, as a society, for AI failing us in this way, or at this scale.”

Nassim Parvin, an associate professor of digital media at Georgia Tech, agrees that the discourse around self-driving cars especially has been overly optimistic. She argues that enthusiasm is obscuring proponents’ ability to see what’s at stake, and that a “genuine,” “caring” concern for the lives lost in car accidents could serve as a starting point to rethink mobility.

“[AI system design should] transcend false binary trade-offs and that recognize the systemic biases and power structures that make certain groups more vulnerable than others,” she wrote. “The term ‘unintended consequences’ is a barrier to, rather than a facilitator of, vital discussions about [system] design … The overemphasis on intent forecloses consideration of the complexity of social systems in such a way as to lead to quick technical fixes.”

It’s unlikely that a single tool will ever be able to prevent unsafe decision-making in AI systems. In its blog post introducing Safety Gym, researchers at OpenAI acknowledged that the hardest scenarios in the toolkit were likely too challenging for techniques to resolve at the time. Aside from technological innovations, it’s the assertion of researchers like Manoj Saxena, who chairs the Responsible AI Institute, a consultancy firm, that product owners, risk assessors, and users must be engaged in conversations about AI’s potential flaws so that processes can be created that expose, test, and mitigate the flaws.

“[Stakeholders need to] ensure that potential biases are understood and that the data being sourced to feed to these models is representative of various populations that the AI will impact,” Saxena told VentureBeat in a recent interview. “[They also need to] invest more to ensure members who are designing the systems are diverse.”

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.