Hi all,
I originally created some javascript code for realtime transcription using websockets, following this doc:
https://platform.openai.com/docs/guides/realtime-transcription
With the relevant code looking pretty straightforward:
...
const OPENAI_REALTIME_MODEL = 'whisper-1';
const OPENAI_REALTIME_URL = `wss://api.openai.com/v1/realtime?model=${OPENAI_REALTIME_MODEL}`;
export async function runTranscription({ connection, customParameters }) {
return new Promise((resolve, reject) => {
let sessionIsReady = false;
let audioBufferQueue = [];
const most_likely_names = customParameters.names || '';
const language = customParameters.language || 'nl';
const openAiWs = new WebSocket(OPENAI_REALTIME_URL, {
headers: OPENAI_HEADERS
});
const sendSessionUpdate = () => {
const sessionUpdate = {
type: 'session.update',
session: {
input_audio_format: 'g711_ulaw',
input_audio_transcription: {
model: OPENAI_REALTIME_MODEL,
language: language,
prompt: most_likely_names // tokens that will have higher likelihood of being recognized
},
turn_detection: {
type: 'server_vad',
threshold: 0.5,
prefix_padding_ms: 300, // Adds x milliseconds of audio before the detected speech turn.
silence_duration_ms: 1200 // after x milliseconds of silence the turn of the caller ends.
},
input_audio_noise_reduction: {
type: 'far_field'
},
include: ['item.input_audio_transcription.logprobs']
}
};
openAiWs.send(JSON.stringify(sessionUpdate));
};
...
As you all can see I used whisper, this worked up until at least the 30th of april. Without any changes in code or environment, on the 6th of may I used it again, resulting in the new error: “Model “whisper-1” is not supported in realtime mode”, so it seems OpenAI changed the supported realtime models.
This is correctly changed here: https://platform.openai.com/docs/models/whisper-1, where the realtime endpoint is indeed not available for whisper. So, I thought I’d simply change to the, still realtime-supported, models: gtp-4o-transcribe and the mini version. But this is not actually the case anymore, while the models are shown to have the v1/realtime endpoint, I receive the same message as with the whisper model: “Error: OpenAI error: Model “gpt-4o-transcribe” is not supported in realtime mode.”.
I can’t really find any info on these updates, I’m wondering if anyone knows more about this? Furthermore, anyone know a good alternative for realtime transcription now?
Many thanks,
Bruno