Utterance prepend improves model responses for chat completions

wnmills3 · February 27, 2026, 2:18pm

When submitting chat completions requests for LLM-generated responses, we’ve found that having brief instructions that can be prepended to the last utterance submitted can greatly improve LLM response behavior. Relying solely on the system prompt at the beginning of a long conversation seems to indicate degradation in the generated response (as though the instructions are forgotten or have less influence). By adding important prompt directives in front of the last utterance in the messages array, we have a “current” reminder that the LLM seems to honor.

The Realtime API doesn’t provide a way to pass additional fields in the session.update message (though it doesn’t seem to be rejected). We have added an “utterance_prepend” text field so the client can override anything loaded on the server. On the server, we add this to the last utterance just before submitting it to the agent LLM, but we do not store it in the chat history we maintain. Because we used the session.update we only allow changes for the entire scope of the conversation as we weren’t sure if we could send an arbitrary session.update in the middle of a conversation. If this is possible, there are nice options to help guide/improve the conversation and generate responses.

Finally, I’ve hoped that the description of a tool would provide a prompt of sorts for the LLM’s tool-calling mechanisms. This doesn’t appear to be the case. For testing purposes, we have a simple tool that returns the time of day. We have tried to tell the LLM to always call the tool when time or data information is requested (so it won’t find the reference earlier in the chat and echo it back). Could we consider a formal way to have tool-specific prompts? I don’t want to have to add tool-specific details to the general system prompt, as tools will grow, and it seems their directives are only needed when the LLM decides whether to call a tool.

_j · February 27, 2026, 3:25pm

I had advice for you to save paying twice for a large input instead of using tool calling, in a way that then can be stripped off with almost zero cache loss. However, running it against gpt-5.2 reveals either a very dumb model or OpenAI stripping out messages you sent:

The AI made no reference of knowing the time the first response, so I had to move the “post-prompt” up and try again.

And then it is not quite right still even if you make this seem to come from a user:

(actually: it seems gpt-5.2 just plainly doesn’t trust anything coming from a developer role message to be truthful).

A massive model failure, which gpt-4.1 understands perfectly as a “developer” post-prompt:

So you can try that post-prompt out on realtime and see what you get for understanding - or put it right at the top of a user text part, within the message, and leave it in context as a message sent time.

The main “description” field of a function should be all the prompt the AI needs. There should be enough explaining what the function is for and the results that will be returned by the function, so that the AI knows when function-calling is necessary to use first, and you can indeed make requirements and prohibitions there.

Topic		Replies	Views
Strategy for using prompts with OpenAI on your data Prompting api	5	5098	October 20, 2023
Prompting Best Practices for Tool Use (Function Calling) Prompting tools	8	12634	February 24, 2025
How can I implement CoT reasoning before tool calling using the Chat Completions API? API api	4	903	June 24, 2025
Chat API: how are different text prompts prioritized/weighed in function calling? API	1	128	October 12, 2024
Realtime API Preamble Inconsistent API realtime , gpt-realtime	1	335	October 11, 2025

Utterance prepend improves model responses for chat completions

Related topics