Skip to content

Invest in quality data to make AI work for you

Generative AI works best when designed for specific contexts—which means your firm’s data could hold massive potential.

2023 is the year of AI breakthroughs, as generative systems and large language models continue to grab the public’s imagination.

Barely a day goes by without an announcement of a new vertical or horizontal application. Few industries are not in the frame. Certainly, more and more enterprises are investing in GPT-based or rival systems, in some cases pivoting to AI-centric workflows and content. The curve of adoption is steep, in a data-fueled economy that allows transformer models to be trained quickly and cheaply.

But are things really as simple as the hype suggests?

The answer is dependent on where your data is sourced and whether it is matched by your in-house expertise. That’s because AI should never be seen as a replacement for human knowledge and insight—for the simple reason that it needs to be trained on precisely those things, by human beings.

Some of these AIs are generating derivative work that only seems ‘intelligent’ because it recycles words that brilliant humans once used—often in very different contexts.

For enterprises keen to explore AI’s potential, therefore, one key consideration could be to apply to your own trusted data, rather than to data pulled from the public internet. With AI’s help, trusted data may generate insights from the history, mission, and embedded skills of your enterprise, rather than from the undifferentiated voices of millions of strangers.

But where training data has simply been scraped from the pre-2021 internet, leaders should be cautious about the outputs of generative AIs. The reliability and trustworthiness of some sources may be in question, and it is important to consider ethics too, where proprietary data may have been scraped without permission.

The U.S. government, the European Union, the U.K.’s Competition and Markets Authority (CMA), and a growing alliance of academics and technology entrepreneurs have all urged caution. The latter are pressing for a development moratorium to counterbalance what some see as a risky, tactical rush by both providers and customers, rather than a considered strategic move.

The White House said providers have a duty to ensure that products are, “ethical, trustworthy, responsible, and serve the public good1.” Business leaders should follow the same advice, as set out in the advisory AI Bill of Rights, the government’s suggested code of good behavior.

So, why all the fuss?

One of the challenges of this wave of popular uptake is that some public-facing versions of these powerful tools have been presented almost as playthings that encourage users to experiment, generating complex outputs from the simplest prompts.

Technology should be intuitive and fun, of course, and that ease of use and richness of output are impressive. But though tempting, tools that output far more than we input should be approached with common sense, and perhaps even a degree of professional skepticism.

After all, when are things ever so easy in business?

The old adage, “If it seems too good to be true, it probably is,” comes into play. The ease with which users can generate apparently new content with negligible effort or cost draws them deeper into relationships with vendors. As a result, they may come to rely on them for business insights, expertise, skilled outputs, and thought leadership.

That doesn’t seem wise. Especially when they may be neglecting their own in-house skills, expertise, and experience in the process. And, just as important, their own trusted data.

As touched on earlier, one of the problems with some popular, cloud-based AI tools is that they have been trained on a mass of data scraped from the web. In some cases, without permission; and in others, perhaps on content that is inaccurate, biased, misleading, or false—including deliberate misinformation.

In the present day, the presence of bias may not be deliberate, but simply reflect flawed human behavior from previous decades: employment opportunities denied to women, for example, or to minorities. But if such data are used to train AI algorithms without considering balance or consequence, then longstanding societal problems might become automated.

By taking everything those AI models present at face value, therefore, organizations are embracing risk as well as opportunity. The human resources you should value most are the ones that already exist in your enterprise.

Remember: generative AIs do not produce content in the same imaginative way as those employees, from their own experiences. AIs have no self-awareness or sentience—nor any comprehension of the outputs they are generating. In reality, they are neither artificial nor intelligent. They are incapable of original thought.

They are entirely dependent on training data, on processing the past to decipher logical probabilities. That’s fine when the data is deep, focused, and specific—the results can be transformative. But when it has been scraped from the web, it presents a risk.

What you put into the AI model yourself—in the form of your own, trusted data—leads directly to whatever value and insights you will extract from it. But this requires an upfront investment in quality data, rather than the minimalist effort mindset implied by some online tools. Users should be active and engaged in AI-assisted processes, not passive consumers employed to click for instant insight.

So, while free, cloud-based generative AIs brought us to this tipping point in popular adoption, in some ways such tools can be counter-productive in an enterprise context. The content they generate is simply not specialized enough to be relevant to most businesses. Drawing from too large a source, both ignores the deeper needs of the user and skews the intent.

And that is not the only challenge.

General-purpose generative AIs continue to suffer from the ‘hallucination’ problem. Once they run out of things to say based on data analysis, they begin to invent content that is no longer rooted in facts. This problem can be limited by only using data that is trusted and relevant to the business—and retaining the human expertise needed to spot unsupported conclusions.

The most promising AIs are those that have received the most focused training and programming, using trusted, industry-specific data sets. Because they draw from a deep pool of sector knowledge, they can be trained to generate valuable content in niche areas.

Generative AI works best when it has been designed for use in specific contexts. Indeed, it can work exceptionally well when applied to focused use cases, where a unique set of learning parameters adds trust and depth to the training data.

This is especially true in industries where there are strict regulations about accurate communications. For example, in healthcare, advertising standards, and financial services.

The key lesson is that generative AI is safer with better training. And this often takes place in highly specialized industries that have both nurtured in-house expertise and retained it.

So, maintaining your organization’s human touch and expertise is essential. What you invest upfront in terms of data is what pays business dividends in the end.

1. White House: Biden-Harris Administration Announces New Actions to Promote Responsible AI Innovation that Protects Americans’ Rights and Safety; 04 May 2023

The information regarding AI tools provided herein is for informational purposes only and is not intended to constitute a recommendation, development, security assessment advice of any kind.

The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information. Fidelity and any other third parties are independent entities and not affiliated. Mentioning them does not suggest a recommendation or endorsement by Fidelity.


Vall Herard

CEO and Co-founder
Vall’s expertise is at the intersection of financial markets and technology with extensive experience in FinTech, RegTech, InsurTech, capital markets, hedge funds, AI, and blockchain. Vall previously worked at BNY Mellon, BNP Paribas, UBS Investment Bank, Numerix, Misys (now Finastra), Renaissance Risk Management Labs, and Barrie + Hibbert (now Moody’s Analytics Insurance Solutions). He holds an MS in Quantitative Finance from New York University and a BS in Mathematical Economics from Syracuse and Pace Universities, as well as a certificate in big data & AI from MIT.

Check out our latest blogs

A quest for the better LLM

A quest for the better LLM

Understand the opportunities and trade-offs of open-source and closed-source LLMs. How can firms balance challenges to utilize better LLMs?

The real impact of AI: what the studies reveal

The real impact of AI: what the studies reveal

Two studies reveal the real impact of AI in the workplace. Learn how it can boost productivity, upskill junior workers, and improve workpla...

SEC Risk Alerts help firms prepare

SEC Risk Alerts help firms prepare

SEC risk alerts highlight areas that intertwine a firm’s marketing practices and scope, and can help firms pre-prepare for examinations.