Follow ZDNET: Add us as a preferred source<!–> on Google.
ZDNET’s key takeaways
- Even the best AI models are challenged to carry out tasks via MCP.
- New benchmarks show models struggle when tasks become more complex.
- More training of AI models is required that’s specific to MCP use.
An emerging category of artificial intelligence middleware known as Model Context Protocol is meant to make generative AI programs such as chatbots bots more powerful by letting them connect with various resources, including packaged software such as databases.
Multiple studies, however, reveal that even the best AI models struggle to use Model Context Protocol. Top AI models such as Google’s Gemini 5 require many, many rounds of interactions with the external programs, leading to long delays in the performance of the AI models.
Also: What is Model Context Protocol? The emerging standard bridging AI and data, explained
“Even state-of-the-art models struggle with different capabilities,” writes Zhenting Wang and team at consulting firm Accenture, the MIT-IBM Watson AI Lab, and the University of California at Berkeley in an August work that introduced MCP-Bench, a set of 250 tasks for AI agents employing MCP.
“Performance generally declines as tasks transition from Single Server to Multi Server scopes,” writes Zikang Guo and team at the University of Science and Technology of China last month when they tested several AI models on their own benchmark test, MCP-AgentBench.
–>
Even the best models today, including OpenAI’s GPT-5, have “failure cases” arising from “repetitive or exploratory interactions that fail to make meaningful progress,” writes lead author Zijian Wu and the team of the National University of Singapore and collaborating institutions in the paper announcing their benchmark, MCPMArk, last month.
Where an AI model can go wrong with MCP
MCP is a kind of middleware for turning AI into client-server interactions. It was introduced last year by gen AI startup Anthropic (makers of the Claude family of large language models and chatbots) as a secure, industry-standard way to connect LLMs and AI agents to external software resources such as databases and customer relationship management software.
As ZDNET’s Steven Vaughan-Nichols explains, middleware like MCP can reduce the number of connections that an AI program has to initiate to connect to multiple external resources.
Also: ChatGPT can now connect to MCP servers – here’s how, and what to watch for
However, having a standard does not mean that an AI model, whose functionality includes a heavy dose of chance (“probability” in technical terms), will faithfully implement MCP.
An AI model plugged into MCP has to generate output that achieves several things, such as formulating a plan to answer a query by choosing which external resources to access, in what order to contact the MCP servers that lead to those external applications, and then structuring several requests for information to produce a final output to answer the query.
The various studies show that while top-of-the-line models such as Gemini 5 and GPT-5 can do better than less-impressive programs, all models are still limited in their ability to manage all those challenges. Issues across all the models include taking an excessive number of steps to retrieve the information, even when the language model’s plan of approach was sound to begin with.
What the benchmarks tell us
–>