Realtime API server_error causing no audio output

Realtime API server_error causing no audio output (Session ID: sess_D18SqiEVpovQRSTuujTQt)

Hi OpenAI Support,

We are using the OpenAI Realtime API (speech-to-speech) in a Twilio phone call integration. During a call, the model attempted to generate an audio response but the response failed with a server_error, resulting in no audio being delivered to the caller.

Session ID:

  • sess_D18SqiEVpovQRSTuujTQt

Observed behavior:

  • A response is created, but the final response.done indicates status=failed with error.type=server_error.
  • No audio delta chunks were produced (so the user hears silence).

Relevant IDs:

  • call_sid: CA7d6d7e9d889fc90e3f6b689dd98b230e
  • response_id: resp_D18Sz8bO2aCWvnDp1ERmb

Timestamp (UTC):

  • 2026-01-23T10:14:06.948Z (response.done with server_error)

Error excerpt:

  • “The server had an error while processing your request… (include session ID in your message: sess_D18SqiEVpovQRSTuujTQt)”

Request:
Could you please investigate why this session produced a server_error during audio generation, and advise whether there are known issues or recommended mitigations? If you need more logs or reproduction details, tell us what to provide.

Thank you.

Troubleshooting details (Realtime server_error / no audio)

Error type:

  • server_error (from Realtime response.status_details.error.type)

Session ID:

  • sess_D18SqiEVpovQRSTuujTQt

Timestamps (UTC):

  • 2026-01-23T10:14:05.934Z (session.updated / session preview captured)
  • 2026-01-23T10:14:06.425Z (response.created)
  • 2026-01-23T10:14:06.948Z (response.done → failed with server_error)
  • 2026-01-23T10:14:07.702Z (Twilio stream stopped)

Models used:

  • Realtime model: gpt-realtime (from session_preview.model)
  • Input transcription model: gpt-4o-mini-transcribe (from twilio_realtime_session_updated)
  • Voice: echo
  • Output modalities: [“audio”]
  • Audio format: input audio/pcmu, output audio/pcmu

Impacted endpoint:

  • OpenAI Realtime WebSocket endpoint: /v1/realtime (WebSocket; server-side via Twilio integration)
    (Our service uses the OpenAI Realtime WebSocket URL configured as OPENAI_REALTIME_URL.)

Impacted request identifiers:

  • call_sid: CA7d6d7e9d889fc90e3f6b689dd98b230e
  • stream_sid: MZ261182ca44a5c871ca7c2896399d4239
  • response_id: resp_D18Sz8bO2aCWvnDp1ERmb
  • conversation_id: conv_D18Sqe07VZEZbH51GfDar

Observed behavior:

  • response.created is received, then response.done returns status=failed with server_error.
  • No output audio delta chunks were produced (caller hears silence).

Approx. failure rate:

  • Based on our recent server logs (last ~2h / ~24h), we observed:
    response.done total: 3
    failed total: 2
    failed due to server_error: 2
    => server_error failure rate: ~66.667% (small sample size but recurring)

Client request/response logs (excerpt):

  • session.updated (preview includes session id and model):
    ts=2026-01-23T10:14:05.934Z event=twilio_realtime_session_updated
    session_preview contains id=sess_D18SqiEVpovQRSTuujTQt and model=gpt-realtime

  • response.done failed with server_error:
    ts=2026-01-23T10:14:06.948Z event=twilio_realtime_response_done
    response_status=failed
    response_status_details={“type”:“failed”,“error”:{“type”:“server_error”,…,“message”:“… include session ID: sess_D18SqiEVpovQRSTuujTQt …”}}

Local timeouts / networking checks:

  • This is a server-side Twilio integration. We did not observe client-side audio deltas for this response; the session ended shortly after.
  • Please advise what additional timing metrics you’d like (e.g., server-to-OpenAI connect latency, response streaming timings).

You need to check 2–3 things

  1. check if you are using a supported audio format with OpenAI.
  2. if you are using local VAD and have some VAD logic, it might be blocking audio from being sent, which could cause this issue.
  3. check “modalities”: [“audio”, “text”] — are you actually sending audio at all?