Function calls consistently produce garbage in parameters

I am doing responses api + streaming. This occurs on the first request/response (but I think can occur in other ones). I often get a corrupted function call. Not every time. But at least 30%. It looks like this:

read_file({
  "line_numbers": false,
  "path": "codeai/tools/go_pkg_public_docs_test.go}'}{ errors: 'JSON decode error'}? Wait path wrong due to braces. need proper. Should be good path. Retry. maybe file path. We'll first list? we can use ls? go mode says use go tools first? But need to inspect file. maybe read_file. path ",
  "request_permission": null
})

This is from OpenAI’s platform logs UI. (I also see this on my end). But I think this shows that OpenAI is doing something wrong - I’m not simply parsing their correct stream incorrectly.

You can see the LLM inserts the ‘}’ there – that shouldn’t be there, and always happens when this occurs (it’s always that character). From there, the LLM “realizes” the } shouldn’t be there and has a mini breakdown.

I get that LLMs are nondeterministic and sometimes do weird things. But this is very reproducible.

Can anyone give advice or help, confirm/deny this is an OpenAI bug, etc?

It looks like the AI generated an unacceptable bad token, which it can do in a string where it is free of any grammar or enum. There is no “retry” for the AI or going back, obviously, tokens are generated one-at-a-time.

I’m guessing gpt-5, even more likely on “mini” or “nano”. These “cheapo” reasoning models have extremely poor certainties, and are worse because they offer no control over the sampling, clearly acting across the board as a random token factory in most any application.

The style of still trying to “reason” is the second hint.

For issues that bad, you could make a function description that is extensive and multi-line, treating the description as free text where you can talk to the AI more about the purpose and even use an example.

Or use a better model accepting "top_p":0.0. OpenAI unfortunately has no “clear best” right now, as even gpt-4.1 is degraded.


Another technique is to allow the AI to retry in the schema itself.

Imagine if you have two keys, one like “read_file_path” and “read_file_path_repeated_again”.

Then instruct that both properties are for the same path purpose, and the second is to guarantee the path was written correctly, a place to try again if a path with an error was output.

Follow that up with "best_correct_path": string, with enums [“file_path”, “file_path_repeated”, “both_good”, “both_bad”]

Then you’ve got an AI that won’t continue to debate with itself, but has a method to correct the field, or even trigger its own re-run retry for you.

This was gpt-5-codex high.

Thank you for the idea of using two paths - that could be a stopgap until a proper fix.