The illusion of explainability in machine learning models

In a global report issued by S&P, 95% of enterprises across various industries said that Artificial Intelligence (AI) adoption is an important part of their digital transformation journey. We’re seeing expanded interest in the adoption of AI for many reasons, including lowering costs, increasing sales, and improving worker productivity. At the same time, if you’re keeping up with the news on AI these days, you know we’re also seeing considerable focus placed on explaining how AI models work and why explainability is important. But our question as two AI practitioners… Is explainability that important? Or does it lead to a false sense of security?

Explainable Artificial Intelligence (XAI), as summed up by IBM Watson, is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms. Many believe that XAI promotes model transparency and trust, making people more comfortable with the risk of improper learning and incorrect predictions that can occur with machine learning models.

It’s human nature to seek explanations as a means of better understanding unknown subjects. We lean on explainability even more when the stakes are high. As recently concluded by two Dartmouth researchers, if the explanation is visually supported by pretty charts, we are partial to it. Explanations can give us a feeling of security when it comes to making informed decisions. Take, for example, a patient who asks a doctor for an explanation of a diagnosis. Even when the explanation is hard to grasp, the more scientific the doctor sounds, the better the patient may feel. It can be the same with AI. The more detail end users are given about how it works, the more likely they are to accept the outcome as valid and feel confident about doing so.

Are explanations sufficient? Some things are complex, and merely having an explanation is not a sufficient and necessary condition to derive utility.

And with many businesses considering avenues for AI adoption, we have to ask about the risks associated with relying so heavily on explainability. What if the explainer is not sufficiently knowledgeable? Users could be fed incorrect information without realizing it. What if there’s not familiarity with the topic to fully grasp the explanation? It is quite possible that when it comes to new topics like AI models, users such as business stakeholders, regulators, and even domain experts may end up with only a superficial understanding of the explanation provided. They may not be able to discern if and how the model was incorrect in the first place, which means even with explanations, users can still end up with disastrous decision making.

In many use cases, a more accurate model is better than having an explanation. After all, what better evidence of utility than a model that gives the right outcome? Hence, we must question if we should be going after explainability, as is the rage right now in XAI, or after truthfulness?

Truthfulness comes from accuracy measures, which give us an indication of how much reliance we can place on the system. Accuracy is directly linked to the quality of the underlying data. The progression of data quality and accuracy over time goes hand and hand. Many AI models are used in dynamic settings where data drift is the norm. Asking crucial questions about the distribution of training data and out of sample data is elemental to having accurate models that can be relied on.

Forget explanations and reasoning for a moment and picture a system that can establish a high degree of truthfulness by means of doing well on a large test dataset across different real-world distributions. Seems too good to be true, right?

Let us examine this concept using a real-life scenario. Have you ever had to ask your colleague or friend for an explanation of how they recognized you in just a nanosecond of time? No, because of the truthfulness of the outcome. It never crosses your mind to understand the “how”, because the end result is correct with a high degree of accuracy. Similarly in AI, when we transition to a phase where the model’s accuracy beats the human baseline, and we reach that high degree of accuracy, explainability will become less relevant. So, what is the alternative to explainability? Simplified, business-friendly metrics. As AI practitioners, we need to recognize that it is difficult for non-practitioners to make sense of our different analytical metrics, such as: F1 Score, Rouge Score, Perplexity, Bleu Score, WER, Confusion Matrix, etc. We need a simplified, business-friendly metric that can be readily understood, like Google’s use of Sensibleness and Specificity Average (SSA) Score in their evaluation score for Meena.¹ While it may not be easy to develop simplified metrics in all instances, it’s imperative we do so whenever possible to limit the need for model explanations and ultimately lead to better decision-making for AI end users.

1. https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html

The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information. Fidelity and any other third parties are independent entities and not affiliated. Mentioning them does not suggest a recommendation or endorsement by Fidelity.

1065880.2.0

The illusion of explainability in machine learning models

Vall Herard

Check out our latest blogs

Building vs. buying AI: What to consider

Vall Herard

Survey: How continuous monitoring may help sharing-economy service providers grow

Jon Elvin

General compliance requirements for insurance advertising

Laurie Lewis

Join our email list for the latest news and events in regulatory compliance and AI

Company

Solutions

Legal