Is there a way to generate dataset based on requests to Assistant API?

Hi,

There is an example in the Evaluations guide how to generate a dataset from real traffic from completions by using the store: true parameter.

Is there a way to generate a dataset from real traffic to Assisnatnt API?

I haven’t found anything similar to the store: true parameter in the Assistant API reference. Also, I haven’t seen any API to create dataset items manually.

My use case: I am building an assistant based on the Assitant API and I want to generate a dataset from real traffic. The dataset should contain a system prompt, a history of chat messages, and attached functions for every call to Assistant API so I can use this dataset in evals and fine-tuning.

2 Likes

You cannot fine-tune on “tools”.

You only get one pre-created tool ,“functions”, to place your functions in.

Assistants uses “tools”

file_search results, for example, are proprietary and are tools messages when returned.

Therefore it is unlikely even in the future.

Assistants also has language out of your control that OpenAI may change.

1 Like

Right, but the tools/functions are secondary to my original question.

The main question is how do I generate a dataset from real traffic to Assisnatnt API?

Is there a way, like in the case with completions?

How: you cannot.

A full dump of all the API calls that Assistants make would be undesirable to OpenAI to provide. Heck, they didn’t even let you see how much it was costing when it was released.

If they managed to keep fine-tuning data locked up in some uninspectable format to train on, that could only be generated by running models that can already do what you want to do, would you even want to just accept that anyway?

1 Like