Approach for using Evals for Assistants?

I’m a bit new to OpenAI, have been working with it for a few weeks. I’ve been through the docs but as far as I can tell, evals are only available for Chat Completion. So hope someone can help me with these questions -

  1. I think it is possible for us to run evals and perform finetuning on a model, and then have an assistant that uses that model. Is this correct?
  2. Is something like Stored Completions possible for assistants, or is that not available? As far as I can see right now, analyzing past threads and manually creating a dataset for the evals is the only option.
  3. Any other recommended approaches if we want to use real conversations as training data?
2 Likes

yup. same question here.

is store: true available on assistant api ?

store:true does not appear supported on the assistant API. Some documentation suggests that, since the thread itself is persistent you can request it’s run stats, capture them the prompts and responses and use those runs in your own Eval tooling. I dont see how to use OpenAI’s Evaluation features on Assistants… which is a big gap for me.