Multiple issues after migrating a multi-turn, streaming agent from the completions API to the responses API:
Instructions on when to call tools are not followed.
Apparent loss of system instruction context when previous_response_id is used.
When an empty tool call array is passed to the responses API, and the instructions + full ensuing conversation are passed as input, the agent behavior (when no tool calls are needed) is the same as the completions API.
While previous_response_id is an optimization and can be avoided, the tool calls are needed in certain cases, and the unexpected tool call behavior is preventing migration.
Environment is nodejs typescript with openai 4.104.0
Thanks. The docs/api-reference/responses/create says for instructions :
When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
So it’s not clear - it implies when using previous_response_id without the instructions param, the instructions will be carried over. Which would make sense - why need to pass the instructions every time, if they don’t change. That is the spirit of the previous_response_id as I understand it. It seems to be like a kind of thread id so the model knows the context prior to the current response create call.
I noticed when using previous_response_id, every call to response create returns the same response id. Which is more evidence the response id is like a thread id, not an id for a specific response call. It’s not clear what the expected/correct behavior is.
As for the tool call issues - I’d have to see if I can create a same project for that. I was hoping the very smooth and reliable tool + prompt behavior I got with the completions API would work as-is with the responses API, but the behavior is very different (neither smooth nor consistent), and even prompt changes didn’t fix what appear to be overly eager (and incorrect) model tool calls.
A previous response ID is just that: all the input messages and the latest assistant message that was output.
When you reuse a response ID the first time, your input is being appended onto that to receive a new output and a new response ID.
To have a conversation, you must continue updating the response ID that you are running the latest input against. Otherwise, you are creating a branch of the conversation in a different direction by reusing an earlier response ID as a starting point.
You have two ways to add a “developer” message:
As the very first message that accompanies and precedes the user input when starting a session, that you do not send again;
As an “instructions” field that you must send in every API request.
With the former, you must end the conversation when the context grows too large and throws an error (the default). Using “truncation”:“auto” to discard messages when you are beyond the 128k, 256k, or 1000k of a model’s input makes no promise that any special messages in the list are kept, and at that point…you are paying $2 for the input portion of every new API call on gpt-4.1 which also is not great.
You might be deceived about “tool” calls, because internal tools like web_search injects its own system message after the results are returned as the most recent message that destroys any developer application.