Skip to content

Are you using artificial intelligence (AI) or simple pattern matching?

Learn the benefits and limitations of lexicon detection and AI and how they can help financial services companies create compliant, on-brand content.

A guide for nontechies

Companies in the financial services industry need to comply with regulations in their communications to help protect the public against risk and fraud. For example, they can’t say “buy our product for superior returns!” Regulations prohibit statements that are exaggerated, unwarranted, misleading, and/or promissory. In addition to creating compliant content, marketers also want to make sure they are adhering to brand guidelines. So, how do you create content efficiently?

Many companies are using tools to help with compliant, on-brand content. Many tools on the market today profess to use AI. AI is an umbrella term used for rule- and logic-based programming as well as advanced learning-based systems. While rule- and logic-based methods are good for some problems, they may not be ideal for the complex regulatory environment since they can’t provide meaningful guidance on any content that hasn’t been specifically defined.

Advanced learning-based systems, on the other hand, learn to recognize complex patterns in data without humans having to explicitly describe all the patterns of interest. This type of AI requires large amounts of data to train the models to “understand” the text to catch more complex risks. Let’s explore both.

Lexicon detection

Lexicon detection is great for specific use cases, and Saifr™ uses it too. Lexicons are rules-based models that perform simple word comparison exercises. They recall from lists that define a few hundred to a few thousand potentially problematic words and/or phrases.

If specific words are used such as “superior,” the system will flag the word anywhere it appears. The system can’t understand context such as quotes, negating words, etc. Therefore, the system will provide many false positives as shown below.

In some cases, it is totally innocuous: 

“Come to our seminar in Superior, AZ.”

If it’s promissory in nature, it can be balanced with words like “may,” “seeks to,” or used with “not”: 

“With muni bond yields at historic lows, preferred securities may provide superior income before and after taxes with potentially less volatility.”

“Indexing small-growth stocks isn’t unwise, but it shouldn’t be regarded as the clearly superior approach either.”

A question can moderate the promissory nature: 

“Want to find success? Focus on providing superior customer service.”

It can be used in a quote and be compliant: 

“It is impossible to produce superior performance unless you do something different from the average,” said John Doe.

All the examples above are acceptable and therefore typically okay to include in communications. Yet, a lexicon system would have highlighted them as potential risks to be reviewed. When there are too many false positives, users get frustrated, don’t trust the system, and simply stop using it. The tool no longer provides promised efficiency gains.

To help address the lack of context with single words, a lexicon model might try to create an extensive list of phrases, such as “may provide superior,” or “seeks to provide superior,” etc. But, this solution can be subjective, not comprehensive, and quickly become unwieldly. Many patterns need to be generated; but the more patterns generated, the more things are without any context. The system needs to consider the full sentence with the pattern in addition to other sentences around it. Then, each needs to be marked okay or not. More and more patterns only make things worse. Again, users get frustrated, don’t trust the system, and simply stop using it.

On the plus side, lexicon applications are simpler to create and implement than machine learning models since the amount of effort required to build and deploy a lexicon system is typically lower. Lexicons do not require human labeling of thousands or millions of documents and therefore are not sensitive to the quantity and quality of the training data set. They also require less computational capital since they are simply matching words and phrases.

Artificial intelligence

Machine learning is a way of programming computers to recognize patterns in data without humans having to explicitly describe all the patterns of interest. There are many categories of machine learning, two of which are supervised learning and unsupervised learning. Supervised learning trains the computer on a large set of carefully labeled data. Unsupervised learning trains an algorithm to find similarities or abnormalities in a data set.

There are instances, however, when these approaches alone are not sufficient. For example, in compliance risk detection, there are more subtle patterns that are likely not present in even the most expertly curated learning data. To overcome this data bottleneck in developing generic models, Saifr uses an amalgamation of advanced approaches such as self-supervised, unsupervised, and supervised learning to help improve the business objectives. Saifr also includes a feedback loop allowing the system to learn from users and become more efficient over time.

These techniques require volumes of high-quality, labeled data that few organizations have or are willing to spend significant funds to build. Saifr began with access to millions of documents from a large, diversified financial services company with multiple business lines. These documents span over fifteen years of expertly curated work from thousands of marketing and compliance experts. These data, and more from other independent sources, serve as raw ingredients for Saifr’s models. By using this massive amount of labeled and unlabeled data from industry and regulatory bodies, Saifr’s AI models are well equipped to handle new scenarios. Saifr’s ability to generalize allows it to overcome many of the shortcomings found in rules-based lexicons.

If you are looking for a tool that will help your content creation and compliance teams to work more efficiently, you need to understand the accuracy that it provides. Keyword and phrase detection can result in too many false positives, reduced usage, and lengthened time to market. Saifr has the data and the expertise to produce robust AI models that provide actionable insights to help create more compliant content faster. Users can grow to trust Saifr’s output and use it to help create on-brand, compliant content more efficiently.


Sara Walker

Content Marketing Associate
Sara has a background in numerous word-related fields, including nonprofit communications, literary blogging, community media, English tutoring, and now content marketing. She holds a BA in English from Arizona State University.

Check out our latest blogs

A quest for the better LLM

A quest for the better LLM

Understand the opportunities and trade-offs of open-source and closed-source LLMs. How can firms balance challenges to utilize better LLMs?

The real impact of AI: what the studies reveal

The real impact of AI: what the studies reveal

Two studies reveal the real impact of AI in the workplace. Learn how it can boost productivity, upskill junior workers, and improve workpla...

SEC Risk Alerts help firms prepare

SEC Risk Alerts help firms prepare

SEC risk alerts highlight areas that intertwine a firm’s marketing practices and scope, and can help firms pre-prepare for examinations.