Bug: response.create returns audio only with a single response.input

vilem · January 19, 2025, 8:59pm

I’m seeing inconsistent behavior from the response.create endpoint when requesting audio output. If I include exactly one item in response.input, the endpoint reliably returns an audio response. However, as soon as I provide more than one item in response.input, the endpoint only returns a text transcript (no audio).

Model: gpt-4o-realtime-preview-2024-12-17
Endpoint: v1/realtime/sessions

This call returns both audio & text:

{ type: "response.create",
  response: { modalities: ["audio","text"], 
              output_audio_format: "pcm16",
              input: [{type: "message", role: "user", content: [{type: "input_text", text: "Tell me a joke"}]}
                     ]
            }
};

This call returns only text (no audio):

{ type: "response.create",
  response: { modalities: ["audio","text"], 
              output_audio_format: "pcm16",
              input: [{type: "message", role: "assistant", content: [{type: "text", text: "Hi I am your assistant, ask whatever."}]},
                      {type: "message", role: "user", content: [{type: "input_text", text: "Tell me a joke"}]}
                     ]
            }
};

Topic		Replies	Views
Issue: OpenAI Realtime API Sometimes Only Responding with Text (No Audio) in Sessions With context API realtime , api-realtime	2	247	March 29, 2025
Completions of gpt-4o-mini-audio-preview model missing audio in response Feedback typescript	0	91	March 31, 2025
Gpt-4o-realtime-preview-2024-12-17 - audio not sent when updating modalities Bugs realtime , api-realtime , api-realtime-speech	0	239	December 18, 2024
Missing response.text.done and response.text.delta events, receiving only audio responses API	0	82	April 28, 2025
Realtime API - No response audio or audio deltas, despite modalities being set to ['audio', 'text'] Bugs api	1	1099	October 24, 2024

Bug: response.create returns audio only with a single response.input

Related topics