Realtime V2 is giving long responses

chinmay1 · May 23, 2026, 12:16am

We have been using Realtime API since preview and switched to V2. We see that V2 likes giving long responses. For example, it would say “Hi I am a Bot. What kind of car are you looking for? Any particular color? Would you like to schedule test drive”

Previous versions would be asking one or two question at a time and build on that.

Any suggestions?

_j · May 23, 2026, 7:27am

Your only control surface to alter the model behavior is the system message you provide. You’ll need to communicate the type of responses that are expected to counter any seen symptoms.

A model ends its chat turn by producing a stop sequence. On chat models, this is a trained special token that is emitted. Different models have different qualities of predicting a stop token to end the message, versus predicting the next word, the next word of another sentence. Some recent models from OpenAI have been quite bad at this, where they simply can’t end, and even repeat the same response, starting over all by themselves or making a new message start without termination being output.

You can’t really communicate, " you produce a stop token which ends your turn", as the AI isn’t really self-aware that is what it is doing, and actually placing the special tokens by their string mapping is disallowed. What you can do is reinforce the style of conversation exchange. This might take the form, “a Bot assistant produces only one question, not a sequence of question sentences, and then the user replies to that single question in their own message.”

Responses and the realtime API don’t let you provide your own stop sequence that can be trained or instructed, unlike Chat Completions, which is for developers that understand AI instead of consumers that want an over-featured product to be used only one designed way.

This reinforces that when post-training the weights of a model on its message format, especially those models with different modalities, the right amount of reinforcement of predicting stop sequences is important because

lahiru · May 24, 2026, 7:23am

Anything else did you notice about this model ?

_j · May 24, 2026, 8:39am

I am not delivering any products with realtime, so I don’t have particular applications that need refinement where I experience differences in symptom cross model, except for experimentally using them, and seeing more the quality in voices and steerability.

What I notice as part of the product envelope, that I can bring to your attention, is the client event to create a conversation item, which can be a mid-session tune up for increased following (one that will be part of the conversation that can be truncated out), and can be textual and a system role message.

RealtimeConversationItemSystemMessage
A system message in a Realtime conversation can be used to provide additional context or instructions to the model. This is similar but distinct from the instruction prompt provided at the start of a conversation, as system messages can be added at any point in the conversation. For major changes to the conversation’s behavior, use instructions, but for smaller updates (e.g. “the user is now asking about a different topic”), use system messages.

So, you have:

tune up gpt-realtime-2 with messaging;
don’t migrate, wait for a model that can fulfill your needs.

Topic		Replies	Views
Chat API returns gibberish/code/foreign words after a few rounds API	6	581	October 20, 2024
Gpt-4o-realtime-preview - non-sense responses with larger system prompt Bugs realtime	13	441	July 9, 2025
🚀 gpt-realtime-1.5 is live in Realtime API API voice , realtime-api	18	5021	March 27, 2026
Responses API producing different behavior after migration from Completion API Feedback	3	313	September 4, 2025
Realtime API instruction limit (16,384 tokens) is too low for production voice agents with tool calling Feedback feature-request , api-realtime	4	214	May 11, 2026

Realtime V2 is giving long responses

Related topics