I’m using the assistant API in combination with function calling. Instead of finishing up with a final message that is just flat text, I would like the assistant to output another json that I can parse.
This is very similar to asking it for another function call. But the problem is that when the assistant provides structured output (json) for a function call, it does that via a run with status “requires_action”. After that the run will continue to wait for me to submit the output of the function back to it. Is there some configuration wherein the run “completes” after providing such structured output? I.e. no need for the assistant to create a final flat text message. The json is its final output.
In more detail: if I understand correctly, a typical assistant workflow is:
- input user message
- assistant notices it needs tool → creates json with tool input
- pass assistant’s tool input into function → create output
- pass function output back into assistant
- if further steps required → repeat steps 2 - 4
- assistant decides it is done → creates flat text output based on steps 1-5
All of this is processed inside “run” objects. When the model wants to use a tool, it creates a run with status “requires_action”, and it waits for the output of the tool. When it finishes, the final run has status “completed”. Is it possible to have a run with status “completed” that, instead of flat text, contains a json of structured data?
One way to architect this applications is to create a function (e.g., assistant_is_done) that is called whenever “the Assistant decides it is done”. You would specify in the Assistant’s Instructions and the assistant_is_done function’s definition, that the assistant_is_done function should be called in the case of (6) above. That way, once the assistant_is_done function is called, you can assume the Assistant is done and manage it as necessary in your application.
Yeah I thought of that as well, but the problem is that this seems very hacky:
- the run never completes, which means no assistant message is added to the list (I think).
- if the user has further follow up questions that should be handled in the same thread, a record of the output should be kept. Preferably in the standard way which is an assistant message.
- it just feels hacky to leave the whole process mid-run, while it’s waiting for input. Surely that’s not good design.
Thanks for the follow up. Below are a few thoughts about the reasons not to do it in bold:
- the run never completes, which means no assistant message is added to the list (I think). You could complete the run by implementing the functionality to end the run in the assistant_is_done function. The function could add a final message to the thread for future reference if you’re using the list of messages for other applications.
- if the user has further follow up questions that should be handled in the same thread, a record of the output should be kept. Preferably in the standard way which is an assistant message. Can you give me more detail on this? How can the Assistant be done, yet the user still have follow up questions?
- it just feels hacky to leave the whole process mid-run, while it’s waiting for input. Surely that’s not good design. Assistants need some case to end their execution, and for this application it sounds like “the Assistant deciding it is done” is a valid one.
1 Like
Most of my workflow work like that. For example I have a ‘process this email’ assistant that has a lot of sub tasks that it runs by calling functions. For example the email might say 'Please add this person to salesforce … ’
It will call a function to see if the person is already in Salesforce and if not add the person in Salesforce (yet another function call). Finally it will write a response email saying what it did (which could be 'it was already in salesforce ’ or I added this person to salesforce click to see it it salesforce.
In this example the final output is a html -(markdown) and not jSON but I have similar examples. You just need to keep the run going until completed?
1 Like
You can tell the assistant to return json format response directly, and place the function definition declaration logic in your instructions.
This will not let the assistant run hanged there, but your instruction lenth and complexity bumps, and need deal with exception when model did not react to instruction correctly(you may have to count on diligent gpt-4 model, gpt3.5 is not good at complex instructions).
Function calling was designed to execute the function which you need implement, and use model judgement to decide whether to call the function and calling parameters, and you are responsible for actual execution and submit the result back.
If you really want to hack the function calling for just get json response, you may skip the round run and use chat completion endpoint externally, and append what you think appropriate to user message to the thread to maintain the context for ai for later reasoning.
Hmmm I suppose I could indeed manually add an assistant message in the last function call. It still feels a bit hacky but it’s the best solution I can see.
As for your question “How can the Assistant be done, yet the user still have follow up questions?”. This is a fairly standard flow I think: user asks question → assistant loops over runs that require actions to research the answer → assistant finds answer and posts it as reply → user reads answer and posts follow-up question in same thread → repeat…
Thanks for your response. Reading your reply, I believe our discrepancy is one of definition, not substance. I understood “Assistant be done” to mean that the application terminates, but here you clarify that it means that the Assistant responds. So, in the words of Lieutenant Frank Drebin, “nothing to see here, carry on”. 
1 Like
This is an old thread - is there a recommended pattern for handling these interactions now?
I believe @willemvdb42 is asking for a moral equivalent of a webhook—a structured “event” generated by the model that, unlike a function call, comes with no expectation of a return value and no expectation that the model would generate further content in response to that return value.
In the absence of such a first-class construct, the best workaround I found to prevent the model from generating more content after calling a function was to cancel the run from within the function implementation: await openai.beta.threads.runs.cancel(threadId, runId);
. This leaves the control over the continuation of the conversation with the user rather than the assistant.
I think it is still relevant - like my original post about processing an email. Depending on the content of the email a lot of function calling is done (or none) - but the model will END with final output which is the email that summarizes its response. In my workflow I ALSO have a ‘when_done’ function that will be called AFTER the run is completed (so that is NOT a function call). In the email example case that function will take care of actually SENDING the email response back to the user. (I wrote about the overall process here .