Hello,
I am currently developing a real-time speech-to-text application using OpenAI’s Realtime API. However, I keep encountering the following error when sending audio data to the API:
{
type: 'error',
event_id: 'event_AWc6bRvTpA5q4zzyFfByl',
error: {
type: 'invalid_request_error',
code: 'unknown_parameter',
message: "Unknown parameter: 'session'.",
param: 'session',
event_id: null
}
}
I cannot figure out the root cause of this issue and would greatly appreciate your guidance.
What I am trying to achieve:
-
Send real-time audio data from an Android app to OpenAI’s Realtime API for transcription.
-
Use a Node.js server to act as a proxy, relaying the audio data from the Android app to the API.
Development environment:
-
- Server: Node.js v14 with WebSocket server (Port: 3000)
- Client: Android app (Java)
- OpenAI Model:
gpt-4o-realtime-preview-2024-10-01
- Node.js dependencies:
ws
,mic
,dotenv
Key parts of my server code:
- Session initialization (to start the transcription session):
const sessionRequest = {
type: 'session.update',
modalities: ['audio', 'text'],
instructions: 'Transcribe audio in real-time.',
input_audio_format: 'pcm16',
input_audio_transcription: { model: 'whisper-1' },
turn_detection: {
type: 'server_vad',
threshold: 0.5,
prefix_padding_ms: 300,
silence_duration_ms: 500,
},
};
ws.send(JSON.stringify(sessionRequest));
- Sending audio data to OpenAI:
if (sessionId) {
ws.send(
JSON.stringify({
type: 'input_audio_buffer.append',
session: sessionId,
audio: message.toString('base64'),
encoding: 'pcm16',
})
);
ws.send(
JSON.stringify({
type: 'input_audio_buffer.commit',
session: sessionId,
})
);
}
Android app code:
The Android app captures audio in real-time and sends it to the local WebSocket server (port 3000). Below is the relevant part of the code:
public void sendAudioData(byte[] audioData) {
if (webSocket != null) {
String base64Audio = Base64.encodeToString(audioData, Base64.NO_WRAP);
webSocket.send(base64Audio);
Log.d("RealTimeTranslationService", "Audio data sent: " + base64Audio.substring(0, 100)); // Log the first 100 characters of the data
} else {
Log.e("RealTimeTranslationService", "WebSocket is not connected.");
}
}
Problem description:
-
The OpenAI API returns an error:
Unknown parameter: 'session'
, indicating that thesession
parameter is invalid. -
According to the documentation, the
session
parameter is required. However, the API rejects it as unknown.
Questions:
- How can I resolve the “unknown parameter: ‘session’” error?
- Is there an issue with the data being sent from the Android app or the server to the OpenAI API?
- Is my request structure to the OpenAI API correct, or are there updates to the Realtime API documentation I may have missed?
Any guidance or insights would be greatly appreciated. Thank you for your help!