Unable to update OpenAI Realtime session (transcription prompt)

Hi,

I want to update the input_audio_transcription prompt during the Realtime session as mentioned here :

I use the following code :

         const transcriptionUpdate = {
              type: "transcription_session.update",
              session: {
                input_audio_format: "g711_ulaw", // "pcm16"
                input_audio_transcription: {
                  model: "gpt-4o-transcribe",
                  prompt: conversationHistoryForPrompt,
                  language: "fr",
                },
                turn_detection: {
                  type: "server_vad",
                  threshold: 0.5,
                  prefix_padding_ms: 300,
                  silence_duration_ms: 500,
               //   create_response: true, // By the way, I need to comment this parameter or I get error "Unknown parameter: 'session.turn_detection.create_response'"
                },
                input_audio_noise_reduction: {
                  type: "near_field",
                },
                // include: ["item.input_audio_transcription.logprobs"], // I comment this, I don't need it
              },
            };

But I get the following error:

 openAiWs.on('message') - Message OpenAI Reçu: {
  "type": "error",
  "event_id": "event_xxxxxxxxxx",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_parameter",
    "message": "Passing a transcription session update event to a realtime session is not allowed.",
    "param": "",
    "event_id": null
  }
}

So, is it impossible to update the transcription during a realtime session (using websocket) ? If it is the case, why is it mentioned in the openAI doc ? Am I missing someting to solve this issue ?

Thanks a lot.

1 Like

I think you are using wrong endpoint. Try this:
realtime-ws-endpoint: wss://api.openai.com/v1/realtime?intent=transcription

Thanks. How should I use your recommendation in my code ? May you give a few lines of code to detail your answer. Thanks.

Hello, @regisAG! :]

Just hit this issue myself. Did you manage to find a solution?

This only works if you’re connecting to the Realtime API via WebSockets. If you’re using WebRTC or SIP, then it won’t help.

That said, this page helped me heaps because the fact that you have to specify an intent parameter with the WebSocket URL is not mentioned anywhere in the RealtimeAPI docs, instead it’s available here, in the Speech to Text docs. It really should appear in the Realtime Transcription page too… This page seems to indicate that you have to use WebSockets if you want to do realtime transcription

(btw for anyone else coming here later, you also can’t specify a model if you set the intent to transcription)

(also pinging @andreped in case this is helpful for you also)

P.S. regisAG, it’s hard to provide a code snippet explaining how to use that URL since it depends on what language you’re using, but if you do a basic tutorial on how to use WebSockets in your environment and read up on the Client Events + Server Events pages in the OpenAI API docs, it should become clear how that URL is meant to be used (you use it to setup a WS connection, and then send session.update with the transcription session parameter, then start sending audio chunks)