I tried using the real-time API for audio transcription, specifically using specifically using gpt-4o-transcribe model. I’m getting responses That makes me believe I may not have access to the real-time API service. Does anyone know if real-time API is generally available? Or is there a special request process to get access?
More details on exact behavior I’m seeing:
I am encountering a consistent 403 Forbidden error when attempting to establish a WebSocket connection to the Realtime API endpoint (wss://api.openai.com/v1/realtime/transcription_sessions). The error message received is: “The server returned status code ‘403’ when status code ‘101’ was expected.”
I have performed the following debugging steps:
API Key Validation: My primary OpenAI API key successfully authenticates with other OpenAI endpoints (e.g., curl https://api.openai.com/v1/models). This shows that the API key is good.
Session Creation via HTTPS POST: Strangely, I am getting a proper response back with an ephemeral client secret that is already expired in the past. Seems like another way of telling me I don’t have access !?
Any guidance or clarification on this 403 error would be greatly appreciated.
Yes, realtime is Available for me in the Playground. I’m able to select gpt-4o-transcribe as the “user transcript model” and gpt-4o-realtime-preview as the “model”. I am able to engage in a conversational back-and-forth, although I am not able to tell if it is using “gpt-4o-transcribe” at all, which is the model of interest for me.
This is the code snippet (C#) where I create the web socket and attempted to connect:
public async Task StartRealtimeSessionAsync(string prompt, string sessionId)
{
LogService.Instance.LogMedium($"[OpenAiModelProvider] Starting realtime session for {ModelId}.", sessionId);
_realtimeCancellationTokenSource = new CancellationTokenSource();
_realtimeCompletionSource = new TaskCompletionSource<string>();
_realtimeTranscriptBuilder.Clear();
_realtimeTranscriptParts.Clear();
_lastItemId = string.Empty;
try
{
_webSocket = new ClientWebSocket();
_webSocket.Options.SetRequestHeader("Authorization", $"Bearer {OPENAI_API_KEY}");
await _webSocket.ConnectAsync(new Uri(REALTIME_API_URL), _realtimeCancellationTokenSource.Token);
// Send initial configuration
var configMessage = new JObject(
new JProperty("type", "transcription_session.update"),
new JProperty("session", new JObject(
new JProperty("input_audio_format", "pcm16"), // AudioService will resample to 24kHz, but OpenAI expects 16kHz for pcm16
new JProperty("modalities", new JArray("text")),
new JProperty("input_audio_transcription", new JObject(
new JProperty("model", ModelId),
new JProperty("prompt", prompt),
new JProperty("language", "en") // Assuming English for now, can be made dynamic //todo
)),
new JProperty("input_audio_noise_reduction", new JObject(
new JProperty("type", "near_field") // Enable noise reduction
)),
new JProperty("turn_detection", (object?)null) // Set turn_detection to null
))
);
await _webSocket.SendAsync(Encoding.UTF8.GetBytes(configMessage.ToString()), WebSocketMessageType.Text, true, _realtimeCancellationTokenSource.Token);
// Start listening for messages in a background task
_ = ReceiveMessagesAsync(sessionId, _realtimeCancellationTokenSource.Token);
LogService.Instance.LogMedium($"[OpenAiModelProvider] Realtime session started for {ModelId}.", sessionId);
}
catch (Exception ex)
{
LogService.Instance.LogError($"[OpenAiModelProvider] Error starting realtime session for {ModelId}", ex, sessionId);
_realtimeCompletionSource?.TrySetException(ex);
DisposeRealtimeSession();
throw;
}
}
const string MODEL = "gpt-4o-transcribe"; // or use ?intent=transcription
var ws = new ClientWebSocket();
// Required sub-protocols (ordered)
ws.Options.AddSubProtocol("realtime");
ws.Options.AddSubProtocol($"openai-insecure-api-key.{OPENAI_API_KEY}");
ws.Options.AddSubProtocol("openai-beta.realtime-v1");
// OR: keep your Bearer header instead of the “insecure” sub-protocol
// ws.Options.SetRequestHeader("Authorization", $"Bearer {OPENAI_API_KEY}");
// ws.Options.SetRequestHeader("OpenAI-Beta", "realtime=v1");
// Correct URI
var uri = new Uri($"wss://api.openai.com/v1/realtime?model={MODEL}");
await ws.ConnectAsync(uri, CancellationToken.None);
You connect to wss://api.openai.com/v1/realtime/**transcription_sessions**
THe realtime API is expecting you to connect to wss://api.openai.com/v1/realtime and pass either ?model=gpt-4o-transcribe or ?intent=transcription
I would try with Python and the OpenAI library to get it up and running first. Then break it down into C# if you have to.
The problem was indeed the URL and the query parameters. After fixing them, I was able to connect successfully. In case it’s helpful for others, here’s the updated code snippet that works:
private const string REALTIME_API_URL = "wss://api.openai.com/v1/realtime?intent=transcription";
public async Task StartRealtimeSessionAsync(string prompt, string sessionId)
{
LogService.Instance.LogMedium($"[OpenAiModelProvider] Starting realtime session for {ModelId}.", sessionId);
_realtimeCancellationTokenSource = new CancellationTokenSource();
_realtimeCompletionSource = new TaskCompletionSource<string>();
_realtimeTranscriptBuilder.Clear();
_realtimeTranscriptParts.Clear();
_lastItemId = string.Empty;
try
{
_webSocket = new ClientWebSocket();
_webSocket.Options.SetRequestHeader("Authorization", $"Bearer {OPENAI_TEMP_CLIENT_SECRET}");
_webSocket.Options.SetRequestHeader("OpenAI-Beta", "realtime=v1");
await _webSocket.ConnectAsync(new Uri(REALTIME_API_URL), _realtimeCancellationTokenSource.Token);
// Send initial configuration
var configMessage = new JObject(
new JProperty("type", "transcription_session.update"),
new JProperty("session", new JObject(
new JProperty("input_audio_format", "pcm16"), // AudioService will resample to 24kHz, but OpenAI expects 16kHz for pcm16
// new JProperty("modalities", new JArray("text")),
new JProperty("input_audio_transcription", new JObject(
new JProperty("model", ModelId),
new JProperty("prompt", prompt),
new JProperty("language", "en") // Assuming English for now, can be made dynamic //todo
)),
// new JProperty("turn_detection", new JObject(
// new JProperty("type", "semantic_vad")
// ))
new JProperty("turn_detection", (object)null)
))
);
await _webSocket.SendAsync(Encoding.UTF8.GetBytes(configMessage.ToString()), WebSocketMessageType.Text, true, _realtimeCancellationTokenSource.Token);
// Start listening for messages in a background task
_ = ReceiveMessagesAsync(sessionId, _realtimeCancellationTokenSource.Token);
LogService.Instance.LogMedium($"[OpenAiModelProvider] Realtime session started for {ModelId}.", sessionId);
}
catch (Exception ex)
{
LogService.Instance.LogError($"[OpenAiModelProvider] Error starting realtime session for {ModelId}", ex, sessionId);
_realtimeCompletionSource?.TrySetException(ex);
DisposeRealtimeSession();
throw;
}
}