How analyzing unstructured data can improve your AML/KYC efforts

Five quintillion bytes of data. That’s how much personal data experts at McKinsey & Company suggest we digitally create every 48 hours. The majority of it is unstructured. This presents a challenge to financial organizations who only have the capability to scan structured data as part of their anti-money laundering (AML) activities or who can’t connect the dots between the output of multiple data scanning solutions. And it has real-world consequences.

In 2017, the U.K.’s Financial Conduct Authority (FCA) fined a large, global bank £163 million for failing to maintain an adequate AML control framework. In their press release, the FCA cited inadequate customer due diligence as one of the reasons for the action. The bank “failed to obtain sufficient information about its customers to inform the risk assessment process and to provide a basis for transaction monitoring.” In doing so, the bank’s actions exposed the U.K. system to the risks of financial crime.

In this example, the processes and solutions the bank had in place failed to meet Know Your Customer (KYC) obligations. As a result, customers were able to execute mirror trades that transferred billions of dollars through the bank to overseas accounts. If the bank had a robust AML /KYC program that included the ability to scan for and analyze a combination of structured and unstructured data, they could have potentially increased their chances of identifying bad actors and reducing the risk of financial crime. To understand why, it helps to understand the characteristics of structured and unstructured data.

What is structured data?

Structured data is data that is formatted in an organized way so that it is easily searchable. It’s the type of data that typically populates databases. Because it is stored in an organized format, software applications that use basic tools like rules-based queries or keyword matching capabilities can easily read it.

Examples of structured data financial organizations may query include customer information, transaction records, financial statements, watchlists, sanctions lists, most-wanted lists, and lists of Politically Exposed Persons (PEP). Financial firms may use keyword matching or core matching tools to search structured data as part of their AML/KYC programs. The tools read the database and output data that is an exact match to their query.

What is unstructured data?

Unstructured data is unorganized and can take place in real time. Because of these characteristics, it’s messy and not easily searchable. Emails, social media posts, chatbot conversations, customer service interactions, scanned documents, images, webpage text, and videos all contain unstructured data. Financial organizations may scan and screen these important data sources as part of an adverse media screening program. However, the quantity of unstructured data is vast, making adverse media screening a challenge. Even if the amount of scanned data is less than five quintillion, it’s more than any compliance team could humanly analyze.

Unstructured data is also context dependent. Meaning, to search unstructured data, one must analyze the words, phrases, and images surrounding that data to identify contextually relevant information that a risk or compliance officer needs to know. This is an activity that humans perform very well, but not at a scale that can keep pace in today’s digital-first environment.

The table below further outlines how unstructured data is characteristically different from structured data.

Structured	Unstructured
Limited pool of data	Abundance of data
Predefined fixed format, often from registries or prepopulated lists	Multitude of formats, including videos, chats, images, text on web pages, social media, business or legal documents, data within emails
Easy to store and manage	Messy
Static, can be dated	Active, real-time
Incomplete insights if it’s the only source of screening data relied upon	Efficiently surfaces countless insights when using LLMs

Learn how AI is helping financial organizations screen multiple data types. Download the white paper → Mitigating risk in the digital age:
A roadmap to AI-enhanced adverse media screening

The challenges in analyzing data for AML/KYC programs

Many financial organizations only employ rules-based tools, keyword matching, or core matching technology for their AML/KYC programs, and that’s a problem. They are limiting their scope to structured data only, even though more than 80% of the world’s data is unstructured. Besides missing out on the discovery of potential bad actors from a majority of online data, searches based on structured data can be inefficient. They can produce false positives, as structured data, by nature, can be static, dated, incomplete, and can’t understand context.

Unstructured data, which represents 80% of internet data, contains valuable information that many other providers may not be able to capture.

Profile-based screening tools can help boost an organization’s scanning capabilities, but they have drawbacks, as well. They can be used to scan for unstructured data as a complement to core matching tools, but unless they can also determine the context of the data, the output can be unmanageable. For example, an organization that queries “John Smith,” may get semi-good results for structured lists, but also every publicly available article that includes the name “John Smith.”

That’s why financial organizations should consider advanced artificial intelligence (AI), machine learning (ML), and large language model (LLM) tools that can scan and analyze unstructured data for adverse media or other important information. These tools can supply the always-on, advanced technology organizations need to screen large quantities of data, identify relevant information, and—importantly—understand the context of that information in real time. Contextual understanding is important because it can help reduce the number of false positive alerts and can be more effective at determining whether a flagged issue is relevant to a specific organization/entity.

How AI and LLM solutions are helping financial organizations reduce risk

Advanced AI-based tools are helping financial organizations scan for and analyze structured and unstructured data to help create more complete and accurate customer profiles. They are especially useful in screening unstructured data for context and relevance as part of an adverse media screening process. For example, advanced AI tools can help determine if a news story is positive or negative.

“The advantage of applying AI capabilities to adverse media screening, in particular, is that you have greater opportunity to discover the potential bad actors that have systematically been mistaken for good actors in the past,” explains Harsh Pandya, Head of Product at Saifr. “In my experience, using this approach, we typically discover two to eight times more risk than previously known, serving up more accurate alerts at the soonest possible time.”

Organizations are taking advantage of AI’s advanced analytics, ML, LLMs, natural language processing (NLP), and powerful computational capabilities to scan for and analyze information from both structured and unstructured data resources. Collectively, these capabilities are giving compliance teams deeper insight into the context of flagged data and greater ability to search for data across online sites 24/7, globally, and in different languages.

AI is becoming an industry best practice for fraud and risk protection

As financial organizations continue to service global customers on interconnected, digital-first platforms, bad actors will persist in inventing ways to use available technology and operational frameworks to try to defraud customers or launder money. AI can help institutions enhance the efficiency, accuracy, and robustness of their fraud and risk management operations. By applying AI to improve screening and analysis across both structured and unstructured data, organizations are streamlining their adverse media screening and KYC monitoring capabilities and improving their chances of detecting potential bad actors more quickly.

Are you getting the best results from your screening solutions?

Before you undertake your customer due diligence, do your own internal due diligence to determine if there are gaps in your processes. Ask your team these questions:

Do we have the tools to scan and analyze structured and unstructured data together in real time?

Are we able to search and analyze online sources in different languages?

Are we using advanced AI to monitor sources for suspicious activities?

Are we detecting potential threats as early as possible?

Learn why it's important to ask these questions in the white paper, Mitigating risk in the digital age: a roadmap for AI-enhanced adverse media screening.

The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information.

1157102.1.0

SaifrScreen for AML/KYC

SaifrScreen for trust & safety

SaifrReview for financial services

SaifrReview for life insurance and annuities

Saifr eComms

Microsoft

ServiceNow

Adobe

Ebooks

White papers

Case studies

Blog

Videos

Client support

FAQs

About Saifr

Press

Careers

Contact

How analyzing unstructured data can improve your AML/KYC efforts

What is structured data?

What is unstructured data?

The challenges in analyzing data for AML/KYC programs

How AI and LLM solutions are helping financial organizations reduce risk

AI is becoming an industry best practice for fraud and risk protection

Jon Elvin

Check out our latest blogs

Regulatory AI’s Expanding Role in AML/KYC

Team Saifr

2026 Trends: AI and Compliance in Financial Services

Team Saifr

AI regulation is everywhere…including at the state level

Allison Lagosh

Join our email list for the latest news and events in regulatory compliance and AI

Company

Solutions

Legal