Hi OpenAI team and community,
I’m using the Realtime API (realtime=v1) and hitting a consistent “server_error” on every response.create call, whether with text or audio input. This happens with both gpt-4o-realtime-preview-2024-10-01 and gpt-4o-realtime-preview-2024-12-17. My audio pipeline works (chunks send and commit), but the API fails to process inputs, returning zero tokens processed. I’ve been troubleshooting for days—hoping you can help pinpoint the issue or confirm a server-side problem!
Problem
What Happens: Every response.create fails with “The server had an error while processing your request”, no response.audio.delta or response.text.delta received.
When: During initialization (text prompt) and after sending audio (live or pre-recorded).
Impact: No responses generated—text or audio—despite successful session setup and audio commits.
What I’ve Tried
Models: Tested gpt-4o-realtime-preview-2024-10-01 and -2024-12-17.
Inputs:
Initial text prompt: “Hello! I’m ready to talk.”
Live audio: 6-10 second recordings via arecord (PCM16, 24000 Hz, mono).
Pre-recorded: 10-second WAV (/home/dan/test.wav, “What’s the time?”, same specs).
Audio Handling:
Real-time chunk sending → switched to accumulating chunks and sending post-recording (like a working Flask-SocketIO example).
Verified PCM16, 24000 Hz, mono format matches API docs.
Session Config:
“session.update” with “turn_detection”: None, modalities [“text”, “audio”], “voice”: “alloy”, etc.
Skipped initial prompt to test bare audio input.
Code Adjustments:
Used websocket-client (Python) with threading; fixed reconnect crashes.
Paced events with delays to avoid overwhelming the API.
Validation:
API key works (model list fetches fine).
Audio commits (input_audio_buffer.committed) succeed.
Despite this, every response.create fails with “server_error”. The API accepts audio but doesn’t process it—usage shows zero tokens.
Log Data
Here’s a snippet from my latest run (March 4, 2025, 11:32 UTC) with /home/dan/test.wav (10s, 240,044 bytes):
2025-03-04 11:32:33,286 - INFO - AI_Voice - Using pre-recorded audio: /home/dan/test.wav
2025-03-04 11:32:33,293 - INFO - AI_Voice - PCM data size: 240044 bytes
2025-03-04 11:32:33,294 - INFO - AI_Voice - Accumulated chunk 1/16 (16000 bytes)
[…]
2025-03-04 11:32:37,580 - INFO - AI_Voice - Sent chunk (44 bytes)
2025-03-04 11:32:37,831 - INFO - AI_Voice - Audio buffer committed
2025-03-04 11:32:37,973 - INFO - AI_Voice - WebSocket event received: input_audio_buffer.committed
2025-03-04 11:32:39,331 - INFO - AI_Voice - Response requested
2025-03-04 11:32:39,618 - INFO - AI_Voice - Full response data: {
“type”: “response.done”,
“event_id”: “event_B7KnnQe5ENyvFs0vIumdP”,
“response”: {
“id”: “resp_B7KnniVkpW8hlGKXlrzGN”,
“status”: “failed”,
“status_details”: {
“type”: “failed”,
“error”: {
“type”: “server_error”,
“message”: “The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the session ID sess_B7KnazsnCarIgWqMxjYNP in your message.)”
}
},
“output”: ,
“conversation_id”: “conv_B7KnaUrlcvs9zpXiiAhwg”,
“modalities”: [“text”, “audio”],
“usage”: {“total_tokens”: 0, “input_tokens”: 0, “output_tokens”: 0, …}
}
}
Full logs available if needed—same error across sessions (sess_B7KVxuLfaeRpKMSSL7Km5, sess_B7KaIa5n6jng8oVTWACG4).
Questions
Is this a known issue with these preview models as of March 2025?
Any session.update settings I might be missing? (e.g., transcription model, specific modality order?)
Could this be a server-side bug, given zero tokens processed despite committed audio?
I’ve ruled out client-side errors—audio commits, and the pipeline matches a working example.
Any insights or fixes would be hugely appreciated!
Thanks,
Dan