Realtime transcription model changes

Hi all,

I originally created some javascript code for realtime transcription using websockets, following this doc:
https://platform.openai.com/docs/guides/realtime-transcription

With the relevant code looking pretty straightforward:

...


const OPENAI_REALTIME_MODEL = 'whisper-1';
const OPENAI_REALTIME_URL = `wss://api.openai.com/v1/realtime?model=${OPENAI_REALTIME_MODEL}`;

export async function runTranscription({ connection, customParameters }) {

  return new Promise((resolve, reject) => {
    let sessionIsReady = false;
    let audioBufferQueue = [];
    const most_likely_names = customParameters.names || ''; 
    const language = customParameters.language || 'nl';

    const openAiWs = new WebSocket(OPENAI_REALTIME_URL, {
      headers: OPENAI_HEADERS
    });

    const sendSessionUpdate = () => {
      const sessionUpdate = {
        type: 'session.update',
        session: {
          input_audio_format: 'g711_ulaw',
          input_audio_transcription: {
            model: OPENAI_REALTIME_MODEL,
            language: language,
            prompt: most_likely_names // tokens that will have higher likelihood of being recognized
          },
          turn_detection: {
            type: 'server_vad',
            threshold: 0.5,
            prefix_padding_ms: 300, // Adds x milliseconds of audio before the detected speech turn.
            silence_duration_ms: 1200 // after x milliseconds of silence the turn of the caller ends.
          },
          input_audio_noise_reduction: {
            type: 'far_field'
          },
          include: ['item.input_audio_transcription.logprobs']
        }
      };
      openAiWs.send(JSON.stringify(sessionUpdate));
    };

...

As you all can see I used whisper, this worked up until at least the 30th of april. Without any changes in code or environment, on the 6th of may I used it again, resulting in the new error: “Model “whisper-1” is not supported in realtime mode”, so it seems OpenAI changed the supported realtime models.

This is correctly changed here: https://platform.openai.com/docs/models/whisper-1, where the realtime endpoint is indeed not available for whisper. So, I thought I’d simply change to the, still realtime-supported, models: gtp-4o-transcribe and the mini version. But this is not actually the case anymore, while the models are shown to have the v1/realtime endpoint, I receive the same message as with the whisper model: “Error: OpenAI error: Model “gpt-4o-transcribe” is not supported in realtime mode.”.

I can’t really find any info on these updates, I’m wondering if anyone knows more about this? Furthermore, anyone know a good alternative for realtime transcription now?

Many thanks,
Bruno