Hello, I want to use new models ( gpt-4o-mini-transcribe and gpt-4o-transcribe) for realtime transcription of ongoing audio (so, not a complete file). The guide gives some instruction on how to achieve this but I feel it’s incomplete and I cannot get to have audio transcribed.
This is where I am so far. First, I create a websocket to the given endpoint.
const websocket = new WebSocket(
"wss://api.openai.com/v1/realtime?intent=transcription",
{
headers: {
Authorization: `Bearer ${this.apiKey}`,
"openai-beta": "realtime=v1",
},
}
);
and here some information is already missing in the guide because apparently the beta header is mandatory.
At this point I wait for the first response from the websocket because each message I send should contain the session id (again, this key information is missing in the guide).
websocket.addEventListener("open", () => {
let sessionId: string | undefined;
websocket.addEventListener("message", ({ data }) => {
const message = JSON.parse(data.toString());
switch (message.type) {
case "transcription_session.created":
sessionId = message.session.id;
websocket.send(
JSON.stringify({
type: "transcription_session.update",
session: sessionId,
input_audio_format: "pcm16",
input_audio_transcription: {
model: "gpt-4o-transcribe",
prompt: "",
language,
},
turn_detection: {
type: "server_vad",
threshold: 0.5,
prefix_padding_ms: 300,
silence_duration_ms: 500,
},
input_audio_noise_reduction: {
type: "near_field",
},
include: ["item.input_audio_transcription.logprobs"],
})
);
break;
default:
console.log("---->", message, typeof message);
break;
}
});
The websocket sends back an error message saying that input_audio_format
is not a valid parameter. If I remove it, the same goes for input_audio_transcription
. I used the very same example request from the guide above.
So, a couple of question:
- is there an example of implementation for this use case?
- is there a different, more complete guide about this or about the websocket communication?
Thanks all.