Skip to main content

What is explainable AI? Building trust in AI models

Software developer
Man and 2 laptop screen with program code.
Image Credit: VeniThePooh via Getty

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


As AI-powered technologies proliferate in the enterprise, the term “explainable AI” (XAI) has entered mainstream vernacular. XAI is a set of tools, techniques, and frameworks intended to help users and designers of AI systems understand their predictions, including how and why the systems arrived at them.

A June 2020 IDC report found that business decision-makers believe explainability is a “critical requirement” in AI. To this end, explainability has been referenced as a guiding principle for AI development at DARPA, the European Commission’s High-level Expert Group on AI, and the National Institute of Standards and Technology. Startups are emerging to deliver “explainability as a service,” like Truera, and tech giants such as IBM, Google, and Microsoft have open-sourced both XAI toolkits and methods.

But while XAI is almost always more desirable than black-box AI, where a system’s operations aren’t exposed, the mathematics of the algorithms can make it difficult to attain. Technical hurdles aside, companies sometimes struggle to define “explainability” for a given application. A FICO report found that 65% of employees can’t interpret how AI model decisions or predictions are made — exacerbating the challenge.

What is explainable AI (XAI)?

Generally speaking, there are three types of explanations in XAI: Global, local, and social influence.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite
  • Global explanations shed light on what a system is doing as a whole as opposed to the processes that lead to a prediction or decision. They often include summaries of how a system uses a feature to make a prediction and “metainformation,” like the type of data used to train the system.
  • Local explanations provide a detailed description of how the model came up with a specific prediction. These might include information about how a model uses features to generate an output or how flaws in input data will influence the output.
  • Social influence explanations relate to the way that “socially relevant” others — i.e., users — behave in response to a system’s predictions. A system using this sort of explanation may show a report on model adoption statistics, or the ranking of the system by users with similar characteristics (e.g., people above a certain age).

As the coauthors of a recent Intuit and Holon Institute of Technology research paper note, global explanations are often less costly and difficult to implement in real-world systems, making them appealing in practice. Local explanations, while more granular, tend to be expensive because they have to be computed case-by-case.

Presentation matters in XAI

Explanations, regardless of type, can be framed in different ways. Presentation matters — the amount of information provided, as well as the wording, phrasing, and visualizations (e.g., charts and tables), could all affect what people perceive about a system. Studies have shown that the power of AI explanations lies as much in the eye of the beholder as in the minds of the designer; explanatory intent and heuristics matter as much as the intended goal.

As the Brookings Institute writes: “Consider, for example, the different needs of developers and users in making an AI system explainable. A developer might use Google’s What-If Tool to review complex dashboards that provide visualizations of a model’s performance in different hypothetical situations, analyze the importance of different data features, and test different conceptions of fairness. Users, on the other hand, may prefer something more targeted. In a credit scoring system, it might be as simple as informing a user which factors, such as a late payment, led to a deduction of points. Different users and scenarios will call for different outputs.”

A study accepted at the 2020 ACM on Human-Computer Interaction discovered that explanations, written a certain way, could create a false sense of security and over-trust in AI. In several related papers, researchers find that data scientists and analysts perceive a system’s accuracy differently, with analysts inaccurately viewing certain metrics as a measure of performance even when they don’t understand how the metrics were calculated.

The choice in explanation type — and presentation — isn’t universal. The coauthors of the Intuit and Holon Institute of Technology layout factors to consider in making XAI design decisions, including the following:

  • Transparency: the level of detail provided
  • Scrutability: the extent to which users can give feedback to alter the AI system when it’s wrong
  • Trust: the level of confidence in the system
  • Persuasiveness: the degree to which the system itself is convincing in making users buy or try recommendations given by it
  • Satisfaction: the level to which the system is enjoyable to use
  • User understanding: the extent a user understands the nature of the AI service offered

Model cards, data labels, and fact sheets

Model cards provide information on the contents and behavior of a system. First described by AI ethicist Timnit Gebru, cards enable developers to quickly understand aspects like training data, identified biases, benchmark and testing results, and gaps in ethical considerations.

Model cards vary by organization and developer, but they typically include technical details and data charts that show the breakdown of class imbalance or data skew for sensitive fields like gender. Several card-generating toolkits exist, but one of the most recent is from Google, which reports on model provenance, usage, and “ethics-informed” evaluations.

Data labels and factsheets

Proposed by the Assembly Fellowship, data labels take inspiration from nutritional labels on food, aiming to highlight the key ingredients in a dataset such as metadata, populations, and anomalous features regarding distributions. Data labels also provide targeted information about a dataset based on its intended use case, including alerts and flags pertinent to that particular use.

Along the same vein, IBM created “factsheets” for systems that provide information about the systems’ key characteristics. Factsheets answer questions ranging from system operation and training data to underlying algorithms, test setups and results, performance benchmarks, fairness and robustness checks, intended uses, maintenance, and retraining. For natural language systems specifically, like OpenAI’s GPT-3, factsheets include data statements that show how an algorithm might be generalized, how it might be deployed, and what biases it might contain.

Technical approaches and toolkits

There’s a growing number of methods, libraries, and tools for XAI. For example, “layerwise relevance propagation” helps to determine which features contribute most strongly to a model’s predictions. Other techniques produce saliency maps where each of the features of the input data are scored based on their contribution to the final output. For example, in an image classifier, a saliency map will rate the pixels based on the contributions they make to the machine learning model’s output.

So-called glassbox systems, or simplified versions of systems, make it easier to track how different pieces of data affect a system. While they do not perform well across domains, simple glassbox systems work on types of structured data like statistics tables. They can also be used as a debugging step to uncover potential errors in more complex, black-box systems.

Introduced three years ago, Facebook’s Captum uses imagery to elucidate feature importance or perform a deep dive on models to show how their components contribute to predictions.

In March 2019, OpenAI and Google released the activation atlases technique for visualizing decisions made by machine learning algorithms. In a blog post, OpenAI demonstrated how activation atlases can be used to audit why a computer vision model classifies objects a certain way — for example, mistakenly associating the label “steam locomotive” with scuba divers’ air tanks.

IBM’s explainable AI toolkit, which launched in August 2019, draws on a number of different ways to explain outcomes, such as an algorithm that attempts to spotlight important missing information in datasets.

In addition, Red Hat recently open-sourced a package, TrustyAI, for auditing AI decision systems. TrustyAI can introspect models to describe predictions and outcomes by looking at a “feature importance” chart that orders a model’s inputs by the most important ones for the decision-making process.

Transparency and XAI shortcomings

policy briefing on XAI by the Royal Society provides an example of the goals it should achieve. Among others, XAI should give users confidence that a system is an effective tool for the purpose and meet society’s expectations about how people are afforded agency in the decision-making process. But in reality, XAI often falls short, increasing the power differentials between those creating systems and those impacted by them.

A 2020 survey by researchers at The Alan Turing Institute, the Partnership on AI, and others revealed that the majority of XAI deployments are used internally to support engineering efforts rather than reinforcing trust or transparency with users. Study participants said that it was difficult to provide explanations to users because of privacy risks and technological challenges and that they struggled to implement explainability because they lacked clarity about its objectives.

Another 2020 study, focusing on user interface and design practitioners at IBM working on XAI, described current XAI techniques as “fail[ing] to live up to expectations” and being at odds with organizational goals like protecting proprietary data.

Brookings writes: “[W]hile there are numerous different explainability methods currently in operation, they primarily map onto a small subset of the objectives outlined above. Two of the engineering objectives — ensuring efficacy and improving performance — appear to be the best represented. Other objectives, including supporting user understanding and insight about broader societal impacts, are currently neglected.”

Forthcoming legislation like the European Union’s AI Act, which focuses on ethics, could prompt companies to implement XAI more comprehensively. So, too, could shifting public opinion on AI transparency. In a 2021 report by CognitiveScale, 34% of C-level decision-makers said that the most important AI capability is “explainable and trusted.” And 87% of executives told Juniper in a recent survey that they believe organizations have a responsibility to adopt policies that minimize the negative impacts of AI.

Beyond ethics, there’s a business motivation to invest in XAI technologies. A study by Capgemini found that customers will reward organizations that practice ethical AI with greater loyalty, more business, and even a willingness to advocate for them — and punish those that don’t.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.