Four questions to ask when evaluating an AI tool

AI is everywhere. It is in our email systems, our phones, our streaming services, and it is starting to be in the tools that we use to get our work done more quickly, more efficiently, and hopefully better. Many people are starting to research how to incorporate AI tools in their daily work. But, if you are the one evaluating AI tools for your company, do you really know how to assess the AI tool? Or what factors to consider when it comes to the AI powering the tool? Here are four questions to ask an AI provider that will help you understand if the AI tool will really meet your needs and be a benefit to the organization.

1. What types of AI are being used?

AI is a broad term that encompasses many techniques that enable machines to mimic human behavior, including rules-based logic, machine learning, and deep learning. As the diagram shows, AI is the most general, high-level term that encompasses the others.

Graphic showing how AI encompasses machine learning, which encompasses deep learning. AI: a technique which enables machines to mimic human behavior. Machine learning: a subset of AI which uses statistical methods to enable machines to improve with experience. Deep learning: a subset of machine learning which makes the computation of multi-layer neural networks feasible.

Often an AI application might use a combination of techniques to efficiently deliver the desired outcomes. For example, a tool to help with language reviews might use the simplest technique of rule- and logic-based programming where humans define a list of problematic words or phrases to be flagged. That simple pattern matching might work for some of the business needs but not all, so another technique such as machine learning might be layered in for the more complex tasks. Machine learning enables computers to learn to recognize complex patterns in data without humans having to explicitly describe all the patterns of interest—the machine isn’t just taught, it learns, but it does require large data sets for training.

Depending on the task, one technique or some combination of techniques might be needed. It is important to know which methods are being used because of potential impacts on data, computing power, accuracy, and human involvement. If the technique requires large data sets that you don’t have, that probably isn’t the system for you.

Read more about the different kinds of AI here.

2. Where does the data come from?

Data is the foundation of any AI system. The adage “garbage in, garbage out” applies with AI models. Poor quality data can lead to incorrect results or negative outcomes, while high quality data can provide actionable insights and meaningful predictions. But what determines data quality? Here are a few things to consider.

Quantity: in general, the more data the better.
Accuracy: it’s vital that data be drawn from credible sources.
Bias: data should be representative of the target population.
Diversity: the data should cover a wide range of scenarios, edge cases, and variations.
Curation and pre-processing: proper preparation should help address inconsistencies, missing values, noise in the data, etc.
Timeliness: the data should be relevant to the current situation.
Privacy and security: sensitive information should be properly protected and anonymized.

AI is only as good as its data, so it helps to ask the right questions to really understand what data is feeding the AI that you are considering.

Read more about the importance of data here.

3. How are the models performing?

AI may seem like magic, but it isn’t. It is just sophisticated probabilities, aka math. Understanding how models perform can be important in deciding if they are right for the problem you are trying to solve.

White paper | Considering AI solutions for your business? Ask the right questions.

There are several, common metrics used to evaluate AI algorithms’ performance.

Accuracy measures the overall performance of the algorithm to correctly predict the output—how often was it right?
Precision can be thought of as a measure of exactness. It measures the quality of the positive predictions—how often was the model correct when it gave a positive result? This is an important measure if false positives need to be avoided. For example, if an AI system is used to detect defects in a production line where each positive requires a person to review the piece, too many false positives would be expensive.
Recall can be thought of as a measure of completeness. It measures the proportion of actual positives that were identified correctly. This is significant when overlooked cases are important. For example, if the AI is being used to detect cancer and misses a lot, it could be catastrophic for those misdiagnosed.
F1 score is a combination of precision and recall and is used to measure the overall performance of an AI algorithm. A good F1 score means that you have low false positives and low false negatives.

Accuracy is used as a metric when the true positives and true negatives are more important, while F1-score is used when the false negatives and false positives are crucial.

There are additional measurements, not covered here, but the list above highlights that whatever measurement you use, it needs to measure what is important for how you are going to use AI. A model can have a 95% accuracy measure, which sounds great, and yet not be a good model for your application if the precision is low, meaning a lot of false positives that you can’t tolerate.

Read more about the importance of measurement here.

4. How is the AI getting smarter?

AI isn’t something you set and forget—it needs to be continually assessed to make sure it is still doing what it should and even getting smarter. But, how does AI get smarter?

“Human in the loop” is one of the ways that AI can continue to learn. Human in the loop simply means that humans are involved in the decision-making process, providing an additional layer of oversight. Take for example AI used to help a service representative answer client questions. The AI may recommend a few options to the rep based on what it knows about the client’s question and the product features. As the rep selects some and rejects others, the AI uses that feedback to learn what works and continues to get better over time. A feedback loop is an important part of any AI application.

When you are evaluating AI solutions, it is good to understand the level of continued learning and the potential human capacity and skills needed to enable it.

Questions to ask an AI provider

Picking the right AI application for your business can be challenging, but it doesn’t need to be overwhelming, nor do you need to be an AI expert to make an informed decision. Four questions will help:

What types of AI are being used?
Where does the data come from?
How are the models performing?
How is the AI getting smarter?

This list should help prepare you to ask some of the important questions to understand if the AI tool you’re considering can address your needs in a way that suits your organization.

For more, check out the white paper: Considering AI solutions for your business? Ask the right questions.

1090981.1.0