Introducing eval support for tool use

The OpenAI Evals product now lets you evaluate tool use! You can now use tools and Structured Outputs in generations completed through both the API and web platform. You can then evaluate the tool calls based on the arguments they received, and the responses they returned.

This supports tools that are OpenAI-hosted, MCP, and non-hosted.

Please let me know if you have any feedback on this!

You can see examples of this in several new cookbooks, for:

8 Likes

Welcome to the community, @henrysg! :wink:

Seriously, though, thanks for the update! Keep it coming, please!

1 Like

The docs for Evals are under Model Optimization. I understand one use case of Evals is to help fine-tune models. But it seems Evals can be used for any kind of development and even basic Prompt Engineering for normal API exchanges or even tuning Custom GPTs.

I’d suggest some linkage among docs for these areas so that people can find and make use of this valuable resource.

1 Like