Introducing eval support for tool use

henrysg · June 9, 2025, 9:04pm

The OpenAI Evals product now lets you evaluate tool use! You can now use tools and Structured Outputs in generations completed through both the API and web platform. You can then evaluate the tool calls based on the arguments they received, and the responses they returned.

This supports tools that are OpenAI-hosted, MCP, and non-hosted.

Please let me know if you have any feedback on this!

You can see examples of this in several new cookbooks, for:

Web search evaluation - Evals API Use-case - Web Search Evaluation
Tool evaluation - Evals API Use-case - Tools Evaluation
MCP evaluation - Evals API Use-case - MCP Evaluation
Structured output evaluation - Evals API Use-case - Structured Outputs Evaluation

PaulBellow · June 9, 2025, 9:31pm

Welcome to the community, @henrysg!

Seriously, though, thanks for the update! Keep it coming, please!

tgravagno · June 11, 2025, 8:11pm

The docs for Evals are under Model Optimization. I understand one use case of Evals is to help fine-tune models. But it seems Evals can be used for any kind of development and even basic Prompt Engineering for normal API exchanges or even tuning Custom GPTs.

I’d suggest some linkage among docs for these areas so that people can find and make use of this valuable resource.

Topic		Replies	Views
Evals in the OpenAI dashboard Announcements	2	807	December 2, 2024
Evals product in Playground - Announcement and feedback Feedback playground , evals	8	617	October 1, 2025
How do I get started doing evals? API	4	2138	May 12, 2023
OpenAI Evals analogous to Fine Tuning? API	5	1271	August 2, 2024
Metaheuristic behavior platform Community	1	509	May 1, 2023

Introducing eval support for tool use

Related topics