Regression in Realtime API Model Behavior – Loss of Determinism with Structured Data

WebPlanning_SIM-Ltd · March 11, 2026, 6:51pm

Description:

Since early January 2026, we’ve observed a significant and consistent regression in the behavior of the gpt-realtime model (including snapshot 2025-08-2), particularly when handling structured, factual data provided via system context.

Expected Behavior

When precise, unambiguous data is injected into the conversation (e.g., availability slots, task lists, resource calendars), the model should:

Parse and reason over the data deterministically
Return consistent answers for semantically equivalent user queries
Avoid hallucination, inference, or “optimization” when raw data is available

Actual Behavior

The model now:

Interprets the same dataset differently based on minor phrasing variations (e.g., “two free days” vs. “two consecutive days”)
Invents implicit rules not present in the instructions (e.g., “only show the first block of consecutive days”)
Makes factual errors on simple date logic (e.g., claiming March 19–20 are not consecutive)
Fails to produce tool_calls reliably, even when actions are clearly defined and requested

This behavior breaks production-grade voice agents that rely on precision over fluency—especially in multilingual, professional environments (project management, scheduling, resource planning).

Impact

User trust erodes due to inconsistent responses
Fallback systems become mandatory, increasing latency and cost
The promise of “realtime, reliable function calling” is no longer met

Request

We urge OpenAI to:

Restore deterministic behavior when structured context is provided
Decouple “conversational fluency” from “factual execution”
Provide a true “strict mode” where the model acts as a deterministic interpreter—not an improviser

This isn’t about making the model “smarter”—it’s about making it trustworthy when real business logic depends on it.

Topic		Replies	Views
Bug Report: gpt-realtime-2 Produces Nonsense Spoken Responses That Are Not Captured in the Logged Transcript Bugs api , assistants-api , realtime	0	153	June 1, 2026
Realtime API: Model Responds Before Function Call Completes (Regression in Tool Calling Behavior) API api , function-calling , voice , api-realtime	0	183	July 21, 2025
Realtime regression in non-English production voice agents: gpt-realtime-mini vs gpt-realtime-mini-2025-10-06 Deprecations api-realtime	10	394	June 2, 2026
Progressive latency increase in long gpt-realtime-1.5 (+gpt-realtime) voice sessions, including non-tool turns, not fully reset by conversation item deletion API api-realtime	1	416	March 24, 2026
GPT 4o mini took a hit ever since o1 was released API gpt-4	10	1153	September 18, 2024

Regression in Realtime API Model Behavior – Loss of Determinism with Structured Data

Expected Behavior

Actual Behavior

Impact

Request

Related topics