I would be interested in having you save and share a Prompts Playground preset. Your function and schema are not transparent enough for anyone to replicate your concern.
You have a json_schema. You don’t show us what that is, or if you are using “strict”.
It is unlikely that your items is supposed to have as output “location” or “temperature”, though.
You are using Responses, which has an internal tool iterator. It also clearly is not working correctly to stop output on ANY special token of ChatML container, as you see restarts of “assistant” that the API backend has captured and placed in an output list.
Clearly, responses should catch such a case of any assistant output pattern, such as it repeating the start of a message, which it is not doing.
Here’s what I think is the basic fault showing though:
The AI must be post-trained on calling functions. It must not close the preamble of the chat message container and start an output, but instead must address with a different token to initiate the tool recipient backend handler.
Your AI is going right for the user response. It cannot backtrack and correct itself when it should have called a function (well, technically it can, but it won’t). Thus, you get your schema used.
Then the AI is seeing that it has made an error, there is no function, so it continues repeating, but then basically has less idea how to invoke a function.
This could be fixed on chat completions a bit with logit_bias. Responses is for developer-as-dummy, though, and has no facilities for token level data or manipulation.
Here’s what I would do to stop the concern:
Give the AI an anyOf schema. It would have a second subschema described, “error”, a schema path for when there is insufficient information or improper diversion right to output.
Secondly, you could describe in the function itself a retraining on how to invoke that function. Without going deep into why, you’d write: “To send to this tool recipient you immediately generate ` to`, as in `assistant to`, and you do not begin a normal response”.
Then OpenAI must fix the model, fix the endpoint, and so many other things that make Responses not worth using just for two internal tools that do not perform for a developer’s needs.