Over the last few years, artificial intelligence (AI) development and usage has consolidated into large language models (LLMs)—they are all the rage and helping to achieve results. Since they are trained on internet and other data that has existed in the past, they can only reflect those experiences. They are pretty cool, yes, but also limited.
Historically LLMs have only performed in one direction as they are creating content, and that contributes to why they can hallucinate. LLMs know what words tend to be associated with other words. They may seem smart; but behind the scenes, they are just predicting the most likely next word based on the previous chain of words. So if they generate the word cat or car as they are creating text, what follows will differ based on how those words have been used before. Thus far, they haven’t had the ability to use a backspace—they can’t go back and correct that it meant “car” and not “cat.”
In addition, LLMs have often faltered when presented with more complex prompts—they can’t break problems into tasks or use tools. As smart as they seem, they can’t add big numbers together, tell the time, or tell you the weather since they can’t consult a calculator, watch, or weather app. The latest AI models are different.
Agentic AI starts with an LLM and adds planning, tools, and the ability to remember what it did before as shown in the chart below:
Within planning, the AI can be self-critical—reflect on what it generated and evaluate it. This ability involves the AI doing something five or six or ten times, having a subject matter expert (SME) set up a grading system to tell it which was closer to the correct answer, picking the best, and remembering the process. Over time the AI will learn the process and correct for many of its mistakes automatically. In this planning phase, the system will break the problem into smaller tasks and set sub goals if needed. This whole process gives the appearance of reasoning.
The next step involves any tools it needs to accomplish the tasks. The tools might include calculators, weather apps, airline booking systems, and other focused AI models, etc.
At the end, the AI will remember what it did wrong, what it did successfully, and can try to avoid making the same mistake again. This process can enable AI to solve more complex, multi-step problems. This new AI offers what is called “chain of thought reasoning.”
This “chain of thought reasoning” is what o-1, aka Strawberry, offers. And, I predict it may well be the next step in exponential growth for AI. While it can’t solve for hallucinations 100%, they are likely to occur less often and that can help with adoption as folks learn to trust AI more.
Agentic AI, with chain of thought reasoning, should help with the issue that large LLMs will be facing in the next three to five years—we are running out of data! Data has been of the utmost importance because, thus far, the more training data you give LLMs, the more accurate they get. All data on the internet is projected to run out before AI is projected to be able to solve critical business capabilities.
Because of this new reasoning ability, AI will likely need less data to produce better results. Accurate models can be created to solve narrowly focused problems with what I’ll call “high-quality, textbook data.” Coupled with agentic AI, you could significantly reduce the errors you see with current LLMs.
For example, consider how current large LLMs have been trained to code. They have considered virtually all open-source code in the world. As you can imagine, there are many, many ways to write code to do a task—some much better than others. Imagine if you started with a data set that was focused only on the best code.
White paper → Considering AI solutions for your business? Ask the right questions.
Start with a small “textbook” documented by SMEs that shows as clearly as possible how to write good, efficient code. Next, have generative AI create lots of examples of good code that SMEs review and select only those of high quality to add to the master book. When this is done, the resulting models trained on the curated, focused book will likely do better than the previous LLMs trained on all code. But why?
An analogy is your reading lots of books with a mix of correct and incorrect data when all you really need is one book with all the right data. Couple that with AI that can reason, is self-critical, corrects mistakes, and remembers its mistakes, and the results can be far more powerful than the LLMs we have had until now.
And, these small, focused models should not only perform better, but should also be less expensive to run and maintain, and will reduce latency.
While I do think that smaller, focused models will be the trend in 2025, there will always be room for larger LLM models. Yet, there is a gap between the larger and smaller models. What is missing is the safety layer.
There is broad recognition by companies engaged in the development and implementation of AI that regulations, such as the EU AI Act, may impact how AI is being used. There is also recognition that AI outputs should be compliant with existing laws—but most LLMs today do not specifically take the relevant rules and regulations into account. In the context of finance or healthcare, AI models don’t fundamentally “know” SEC, FINRA, HIPAA, or any of the regulations. AI guardrails are needed to help ensure that AI outputs don’t go off the rails by ignoring laws. I believe that over the next three to five years, an increasing number of RegTech startups and larger LLM firms working with specific industry players will offer a middle layer of AI aimed to help address these regulatory areas. A safety layer is needed to sit on top of LLMs to help with rules/regulations, otherwise adoption will stall out.
Agentic AI is good at solving math problems, symbolic problems, and common-sense reasoning that LLMs couldn’t solve. I believe, with the combination of agentic and safety guardrails, the following prompt could be doable in three minutes:
Write a five-page article about alternative investments offered by company X. Please make the content compliant with SEC rule 482, FINRA rule 2210, and Regulation Best Interest. Show returns for 1, 3, and 5 years. Compare and contrast with similar products from company Y. Generate 3 graphics incorporating brand guidelines found at: X/assetmanagement/brand/images
With the combination of agentic and focused compliance models, this complex problem can be broken into tasks and solved. In fact, Saifr performed an experiment where we used nine models to solve it in three minutes. The analysts we showed the output to had no idea it was AI generated! While it is still experimental, hasn’t solved the hallucination problem completely, and its real-life application will require a full assessment, these are very encouraging results that we expect can only get better.
There is yet another category of AI coming that incorporates the idea of reason and action. When we solve a problem, we usually have an observation, a thought, and an action. We assess if it worked, and if not, do again until get we get it right. This AI is mostly academic now, but thoughts are that it may become more accurate than chain-of-thought models.
I think that these two new types of AI will likely dominate over the next 3-5 years, making AI better and better and able to solve key problems that businesses face.
End users are adopting many AI solutions, but not yet deploying at scale to solve key problems. And, there have been fundamental safety and regulatory concerns beginning to stall growth. With the combination of agentic, quality data, and guardrails, AI can help solve business-related problems more efficiently. I think the adoption curve will begin to ramp up in 2025 as folks experience AI’s ability to solve complex problems and therefore become more optimistic.
The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information. The information regarding AI tools provided herein is for informational purposes only and is not intended to constitute a recommendation, development, security assessment advice of any kind.
1172771.1.0