Skip to content

Google DeepMind Unveils TOUCAN: A Groundbreaking Dataset for AI Tool Interaction

TOUCAN, the largest real tool interaction dataset, is set to revolutionize AI agents' interaction with external tools. Benchmark results show significant performance gains.

In this image we can see there is a tool box with so many tools in it.
In this image we can see there is a tool box with so many tools in it.

Google DeepMind Unveils TOUCAN: A Groundbreaking Dataset for AI Tool Interaction

Google DeepMind's research team has introduced TOUCAN, a groundbreaking open dataset designed to enhance AI agents' interaction with external tools. The dataset, developed by a collaborative team from MIT, IBM, and the University of Washington, has shown promising results in improving tool usage for open models.

TOUCAN is the largest dataset of its kind, featuring 1.5 million real tool interactions. It was generated using a five-stage pipeline involving multiple language models and actual API executions in real environments, setting it apart from previous datasets that relied on simulations.

The dataset comprises 495 real Model Context Protocol (MCP) servers, offering over 2,000 tools covering a wide range. Each entry in TOUCAN describes a complete usage chain, from task description to final result. In the MCP-Universe benchmark, TOUCAN models outperformed larger open systems like Llama-3.3 (70B) and GLM-4.5 (106B) in certain cases. Furthermore, the Qwen-2.5-32B model's score on the BFCV3 benchmark increased by 8.7 percentage points after fine-tuning with TOUCAN, demonstrating its significant potential in improving open models' tool usage.

TOUCAN, developed by a collaborative team of researchers, is poised to revolutionize AI agents' interaction with external tools. With its 1.5 million real tool interactions and proven performance gains in benchmark tests, the dataset is set to enhance open models' capabilities in tool usage, marking a significant step forward in AI development.

Read also:

Latest