AI’s jagged frontier: conflicting research about AI’s capabilities

Written by Vall Herard | Jan 10, 2024 2:26:00 PM

Artificial intelligence (AI) is transforming a variety of professions, but what is its real impact? New research from academia and industry provides tangible evidence of the potential benefits and risks of AI augmentation. Some studies have zeroed in on generative AI’s effects on tasks like writing and customer service interactions, while others have raised questions about its accuracy. The findings indicate that this technology, still in its infancy, can enable professionals to work faster, produce higher quality output, and earn better reactions from stakeholders—but it’s not always that simple and straightforward.

This roundup summarizes the learnings from a handful of studies. While more research is always helpful, the early research signals that generative AI can help provide a competency boost for workers and value gain for employers.

Is AI a subject matter expert?

We’ve heard a lot about how AI can seemingly tackle any intellectual task. In March 2023, for example, GPT-4 (an AI chatbot) scored 297 on the bar exam, putting it in the 90^th percentile of human test-takers—high enough to practice law in most states, researchers found.

It’s not always sunshine and rainbows, though. AI does not actually know everything.

In May 2023, researchers at Long Island University asked the free version of ChatGPT 39 questions about medications. The chatbot answered just 10 of those questions to the research board’s satisfaction, raising concerns that patients who use ChatGPT for medical information may be misled or endangered. OpenAI, the company behind ChatGPT, noted that the chatbot is not fine tuned to provide medical information and pointed to its usage policy, which states that users should not rely on ChatGPT for medical advice or care. The 29 questions to which ChatGPT did not provide satisfactory answers did not directly address the question or were inaccurate and/or incomplete.

Another study in August 2023 looked at ChatGPT’s answers to software engineer and developer questions and found that the AI was incorrect 52% of the time. Interestingly though, users preferred ChatGPT’s answers 40% of the time since the chatbot is so articulate.

There are numerous other examples of ChatGPT’s uneven ability to be a subject matter expert, so this is an area that will likely see continued experimentation.

Can AI boost productivity?

As I’ve previously discussed in another blog, some studies have shown that large language models like ChatGPT can boost productivity.

An MIT study examined the impact of generative AI on professional writers. Researchers gave one group access to ChatGPT, while a control group worked unaided. The ChatGPT group finished assignments 40% faster than the control group, and their work was rated 18% higher quality. Furthermore, using AI helped less-experienced workers produce higher quality work faster, compressing the productivity distribution and lowering worker inequality. Participants also reported increased job satisfaction from using AI, likely because they were able to simultaneously and easily improve quality and efficiency.

Download now | Considering AI solutions for your business? Ask the right questions.

Similarly, NBER examined how generative AI impacts customer service agents. Researchers provided one group of agents with an AI assistant that suggested responses and linked to relevant internal technical documents. Agents with access to the AI tool were 14% more productive on average, resolving more customer issues per hour. Some even achieved gains of 22%. Access to AI again helped upskill junior workers more quickly; agents using AI for two months performed as well as or better than tenured agents who weren’t using AI. Lastly, customers responded more positively to those agents using generative AI.

The message here is clear: AI can help workers do their jobs faster and better.

What is AI’s jagged frontier?

But that isn’t always the case.

To further complicate matters, one study revealed that generative AI is a benefit in some contexts and a detractor in others. A group from Harvard Business School suggested that AI has a “jagged technological frontier” in which certain tasks align well with AI strengths (allowing AI systems to excel at them), while other tasks that appear similarly complex prove more difficult for AI to complete at human-level quality.

To study this jagged frontier, the researchers assigned several hundred Boston Consulting Group consultants 18 realistic consulting tasks and gave about two-thirds of the group access to AI, specifically GPT-4 (the paid version of ChatGPT). The workers using GPT-4 for tasks within the frontier completed 12% more tasks on average, 25% faster. Plus, 40% of the AI group produced higher quality work.

However, the consultants using AI for tasks outside the frontier were 19% less likely to deliver the correct solution. This is an example of when generative AI can slow you down; sometimes it’s more efficient to perform tasks yourself, rather than having AI help.

A McKinsey study of developer productivity came to a similar conclusion. Some coding tasks could be completed with AI assistance in half or two-thirds of the time it would traditionally take, signaling massive potential for productivity and efficiency gains. However, time savings for other tasks was just 10%; and for some junior developers, using AI took 7-10% longer.

McKinsey also found that completing tasks more quickly did not sacrifice quality: AI-augmented work was slightly better than solo human work. However, developers iterated with the AI tools to reach that quality, suggesting (as many already believe) that AI is better used to augment human workers, not replace them.

Takeaways

While AI holds great promise, we have to approach its integration thoughtfully. As we’ve seen, AI doesn’t know everything and needs a human in the loop. Medical advice is just one example of AI that requires specialized training and human oversight. However, more generalized applications can certainly be a boon for productivity, having already demonstrated major benefits. With tasks like writing and customer service, AI can enable professionals to work faster and more accurately.

Still, today’s AI has uneven capabilities and limitations. The jagged frontier warrants further research and experimentation, especially as AI technologies evolve and improve. With thoughtful and strategic deployment focused on augmentation rather than automation, AI can uplift a wide range of professions without compromising ethics or excellence. The early research shows cause for optimism, combined with the need for human agency, caution, and sound judgement. I plan to follow along as additional studies are published.

Are you considering AI solutions for your business? Ask the right questions.

The opinions provided are those of the author and not necessarily those of Fidelity Investments or its affiliates. Fidelity does not assume any duty to update any of the information. Fidelity and any other third parties are independent entities and not affiliated. Mentioning them does not suggest a recommendation or endorsement by Fidelity. The information regarding AI tools provided herein is for informational purposes only and is not intended to constitute a recommendation, development, security assessment advice of any kind.

1124949.1.0

View full post