Skip to content
AI

Navigating the challenges of AI integration in financial services

The integration of RAG in AI systems holds immense potential for the financial services sector, but must be handled with diligence.

In February 2023, roughly three months after the launch of ChatGPT's inaugural version, I found myself intrigued by a podcast during my routine walk. The host, known but not famous, detailed a recent encounter with ChatGPT, where he posed a personal question. The chatbot's response was elaborate, convincing, yet ultimately incorrect—a textbook example of an AI hallucination. This phenomenon typically occurs when a language model, lacking appropriate guardrails, is asked about subjects or individuals on which it was not thoroughly trained on, especially those who are not widely known. This episode piqued my interest; and for several days, I contemplated the severity of the issue. The outcome was a blog: Where truth can become blurry: explaining factuality in AI chatbots. Here, I touched upon retrieval augmentation, a then-nascent research area promising significant potential to curb such hallucinatory tendencies.

Jumping ahead to 2024, Retrieval-Augmented Generation (RAG) has evolved into a crucial element of most text-generating AI systems. This progression was not accidental but a direct response to the hallucination problem that I explored in my blog. The objective of integrating RAG is to root the knowledge of a language model in authoritative information supported by reliable sources. Behind the scenes, the AI system, prior to responding to a prompt, enriches it with pertinent snippets of text sourced from the internet or a private database. Equipped with RAG, the AI system can reference sources, allowing users to cross-check assertions and consequently diminish the frequency of hallucinations.

The financial services sector is poised to reap significant rewards from the integration of RAG in generative AI, contributing to its overall expected contributions to the sector. A study by consulting firm McKinsey estimates that the potential annual gains from boosted productivity in banking could exceed $200 billion. The report also projects potential opportunities in other financial sectors such as asset management, wealth management, and investment banking to reach at least $25 billion each. This is hardly surprising. Armed with RAG, AI-generated content can be more comprehensive, transparent, credible, and current (recall that without guidance, language model outputs are akin to snapshots of the past). These characteristics are critical in an industry where content directly influences customers.

The importance of compliance in AI-generated content

Let's suppose that content is comprehensive and hallucinations are minimized—should content creators unquestioningly rely on AI-generated content and promptly deliver it to clients? The answer is a resounding no, especially in the financial services sector, where creators should tread even more carefully. Tasks in this sector require a unique skill set, including a proficiency in financial calculations, an understanding of company and regulatory policies, and a deep grasp of fundamental financial concepts. Consider a chatbot that interacts directly with customers. Given company financial statements and additional context from related documents, a user might inquire about “the total cost of share repurchases in a specific year, and the reason for any decrease or increase from previous years.” Besides adhering to company and regulatory policies, the system would need to perform calculations and comprehend financial and accounting concepts to provide a valuable response. RAG alone is not equipped to handle such a request.

A new benchmark dataset, FinanceBench (Vidgen, 2023), serves as a valuable tool for evaluating these capabilities. This dataset incorporates data from 40 companies, spanning 361 public filings issued between 2015 and 2023. The filings, including 10Ks, 10Qs, 8Ks, and earnings reports, provide the factual basis for answers, aligning with the RAG workflow. When creating an entry in this dataset, evaluators formulate a question, provide an answer, and include a contextual excerpt of text from which the answer can be inferred. This excerpt is selected from the 361 public filings. For example, a question such as “what is the cost of goods sold in USD for company X in FY2022?” would be paired with its respective answer—let's say “$Y million”—and several paragraphs or the table from which the Y was calculated. In assessing a language model's performance within a financial context, the model is presented with both the question and the context as a prompt. The challenge for the model is to produce the correct answer.

FinanceBench is distinctive in that it eliminates the RAG component during the evaluation process. Essentially, when a language model is given a question, the context is automatically supplied, whereas in reality, RAG would first need to retrieve it from a multitude of documents. Consequently, this benchmark seeks to answer the following question: Assuming RAG operates at 100% efficiency, how adept are the current language models at answering questions within a financial services framework? An even more crucial follow-up question is: How adept are these models at staying compliant while they are at it? OpenAI's GPT-4 leads the pack, correctly answering 8 out of 10 questions. However, its accuracy dips to 50% for calculation-intensive questions.

Most financial services firms tread carefully around potential data leaks. Instead of risking data privacy issues by transmitting their data to OpenAI via APIs, they tend to favor open-source language models that they can host privately. On this benchmark dataset, the top-performing open-source language model is Mixtral, developed by Mistral AI, which correctly answers 6 out of 10 questions overall and 4 out of 10 calculation-intensive questions. It's clear that whether the choice is the current industry leader, OpenAI, or open-source models to drive a generative AI system, extra diligence is necessary to guarantee the precision of the content generated, not forgetting the importance of ensuring compliance.

Ebook → AI insights survey: Adopters, skeptics, and why it matters.

Assuming that the generated content is accurate, can we also assume it's compliant? Not necessarily. Compliance within the financial services sector frequently requires adherence to intricate regulations that go beyond factual accuracy, encompassing aspects of fairness, privacy, and transparency. For instance, the recent AI Act introduced by the European Union bans manipulative or deceptive techniques in AI models. Similarly, FINRA Rule 2210 prohibits marketing content that includes misleading or exaggerated claims. Therefore, the task of assessing the compliance level of a generative AI system falls beyond the purview of FinanceBench. This prompts the question: What are some key next steps for financial services organizations?

Strategic approaches to managing AI in financial services

At the highest level, financial services organizations should consider constructing a strategy based on three pillars. First, stay informed on acts and regulations that affect AI and financial services. Second, establish practical guidelines. And third, explore continuous monitoring systems for AI content that can be combined with a human in the loop before anything reaches the consumer.

First, staying current with regulations is an enormous task. The traditional method involves hiring a team of lawyers or consultants to wade through regulatory documentation and track changes over time. However, recent advancements in AI can offer a more efficient approach: a smaller team utilizing AI’s summarization techniques to monitor changes, thus reducing the number of documents to process and the staff needed. The optimal approach could be to employ an even smaller team supervising AI agents tasked with constantly reviewing regulatory documents and highlighting pertinent changes.

AI agents can be ideal for tracking changes in regulations due to their ability to operate autonomously with minimal human guidance. A basic version of such an agent might target a specific FINRA rule, like FINRA 2210. The agent could be configured to regularly scan relevant FINRA websites and official news sources for updates. A language model could be used to detect changes in the text, or even trained to specifically note changes in regulations, triggering a notification system to alert stakeholders. In practice, a developer might also add security measures, a customizable user interface, and a feedback page for expert learning.

Second, when establishing guidelines for AI-generated content, organizations should consider focusing on two areas: robust data governance and setting boundaries for acceptable content. A solid data governance process helps ensure quality data is available for both grounding and fine-tuning language models. As previously mentioned, RAG is fundamental for grounding most generative AI systems. The effectiveness of RAG relies on the quality of its data sources. If these sources contain personal information, there could be concerns around data privacy and security.

By refining language models with high-quality, diverse datasets, organizations can focus on narrower, more specific domains, reducing the risk of generating non-compliant text. Organizations might also consider generating only specific types of content to avoid stepping into unfamiliar territory. For instance, the focus could be on short marketing blogs rather than long white papers. Regardless, standards for properly labeling AI-generated text should always be maintained.

Lastly, monitoring AI content requires both technical and human oversight. On the technical side, this involves deploying advanced algorithms designed to monitor and assess the compliance of AI-generated content. These could be NLP models trained for specific tasks, such as scanning for exaggerations in text. As these models are generally small and task-specific, they can efficiently process generated content and offer higher accuracy than their general-purpose counterparts.

Despite their efficacy, task-specific models cannot fully replace the nuanced understanding and judgement of human experts. Human oversight is therefore essential in interpreting the findings of these models, providing context, and making final decisions on the content. This may involve legal experts reviewing flagged content for potential copyright or regulatory issues, marketing teams ensuring content aligns with the brand voice and ethics, and compliance officers overseeing the entire content generation and review process.

Wrapping up

In conclusion, the integration of Retrieval-Augmented Generation in AI systems holds immense potential for the financial services sector, promising enhanced productivity and potential gains. However, the use of AI-generated content must be handled with diligence, considering its accuracy and compliance with intricate financial regulations.

Financial service organizations should consider adopting a three-pronged strategy: staying informed about relevant regulations, establishing practical guidelines for AI-generated content, and implementing a robust monitoring system. This involves harnessing AI's potential to track regulatory changes, ensuring sound data governance, and employing both technical and human oversight to monitor AI content for compliance and quality. As the AI landscape continues to evolve, these measures will be crucial in more safely leveraging AI's capabilities and navigating its challenges.

If you're curious to know how compliance and marketing professionals at US financial institutions are using AI, download the ebook, AI insights survey: Adopters, skeptics, and why it matters.

 

 

The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information. Fidelity and any other third parties are independent entities and not affiliated. Mentioning them does not suggest a recommendation or endorsement by Fidelity.

The information regarding AI tools provided herein is for informational purposes only and is not intended to constitute a recommendation, development, security assessment advice of any kind. Consider your own use case carefully and understand the risks before utilizing a generative AI tool.

1149831.1.0

Last Feremenga

Director, Data Science
Last is one of Saifr's AI experts, specializing in natural language processing (NLP) and regulatory compliance in financial services, and currently heads the AI-Applied Research Team at Saifr. He has an extensive background in research and has previously built NLP-based AI systems for risk mitigation.

Check out our latest blogs

AI’s double act: where quick wit meets deep thought

AI’s double act: where quick wit meets deep thought

Explore the dual approach to AI that combines rapid intuition with deep analytical thinking to revolutionize complex problem-solving.

Harnessing the power of Gen AI for evaluating AI systems

Harnessing the power of Gen AI for evaluating AI systems

LLMs, a type of Gen AI, can be used to help evaluate AI solutions, from stress testing to test set generation.

Three takeaways from Compliance Week’s AI and Compliance Summit

Three takeaways from Compliance Week’s AI and Compliance Summit

Key topics in the FinServ compliance space include opportunities and challenges for professionals, generative AI supervision, and the US re...