Evaluations and Chat completions : Need support for tool use and image

b.s.dnivog · November 13, 2024, 11:13am

Hey OpenAI Team and Community, So we have been trying out the evals and chat completions storage from open ai platforms dashboard for our eval use case and we had a few issues that made it unusable for us.

When attempting to run evaluations, we noticed that:

The dashboard only imports the system and user prompts, ignoring the tools specified for evaluations as well as image input.
The assistant’s outputs via function calls are not captured during the evaluation, resulting in empty outputs and, consequently, failing tests.

Interestingly, when we switch to a different model (e.g., GPT-4) and let it generate responses on the fly for the same inputs, the evaluations produce outputs and pass the tests. ie, only the assistant content is captured by the evals and it ignores the tool use output.

Is there a current plan to support tools and function calling in the evaluations dashboard?
If not presently supported, do you have an estimated timeline for when this feature might be available?

Topic		Replies	Views
Using Evaluations with images gpt-4o-mini API evals	1	55	December 3, 2024
When will Evaluation API be ready? Feedback evals	3	64	January 3, 2025
Any plans to support the retrieval tool in the Chat API? API	1	722	December 3, 2023
Using the assistance / chat completion API to ask about an image attachment? API api , image-reading , chat-with-images	5	4845	December 17, 2023
Allowing Images in Non-User Messages Feedback api	11	624	November 19, 2024

Evaluations and Chat completions : Need support for tool use and image

Related topics