Approach for using Evals for Assistants?

g_b · December 18, 2024, 9:20am

I’m a bit new to OpenAI, have been working with it for a few weeks. I’ve been through the docs but as far as I can tell, evals are only available for Chat Completion. So hope someone can help me with these questions -

I think it is possible for us to run evals and perform finetuning on a model, and then have an assistant that uses that model. Is this correct?
Is something like Stored Completions possible for assistants, or is that not available? As far as I can see right now, analyzing past threads and manually creating a dataset for the evals is the only option.
Any other recommended approaches if we want to use real conversations as training data?

timfosteman · December 20, 2024, 8:28am

yup. same question here.

is store: true available on assistant api ?

bruce.angelis · March 8, 2025, 12:38am

store:true does not appear supported on the assistant API. Some documentation suggests that, since the thread itself is persistent you can request it’s run stats, capture them the prompts and responses and use those runs in your own Eval tooling. I dont see how to use OpenAI’s Evaluation features on Assistants… which is a big gap for me.

Topic		Replies	Views
Evaluations for Assistants (with file_search) API assistants-api , evals	0	198	November 11, 2024
New Assistant feature and Fine-tuning API	4	3807	February 5, 2024
Evaluations and Chat completions : Need support for tool use and image API chat-completion , evals	0	158	November 13, 2024
Benchmark & Evaluation Frameworks for Assistants API gpt-4 , chatgpt , api , assistants-api	0	548	April 25, 2024
DataDog LLMObservability in Assistants API API assistants-api	1	83	January 20, 2025

Approach for using Evals for Assistants?

Related topics