Realtime API: session update doesn't change input audio format

Guys,

I am updating the session and trying to change instructions, input_audio_format and output_audio_format (setting both to ‘g711_alaw’).

Server is sending a session.updated event with all values correctly updated except:

input_audio_format: ‘pcm16’,

Any ideas? Is this a bug?

5 Likes

I also noticed that I cannot set input_audio_transcription property “enabled” to true:

ERROR: Unknown parameter: ‘session.input_audio_transcription.enabled’.

3 Likes

I ran into the same thing. I believe this must be a bug because it is contrary to the official documentation at https://platform.openai.com/docs/api-reference/realtime-client-events/session-update

I tried removing enabled in which case it doesn’t error but there is still no transcription occurring.

I think we might need help from someone on the OpenAI team.

4 Likes

I have the same issue, g711 does not work and transcription does not work

1 Like

I do notice if you leave enabled: true out then it does start giving you a transcription

1 Like

You are right - it looks like they removed the attribute ‘enabled’.

also the same problem with:

"max_output_tokens"

and temperature can’t be lower 0.6

3 Likes

A little update: after testing around with codecs, it now looks like I am able to submit g711_alaw audio although the session.updated field is still pcm16. It looks like session.update works as expected (the codec is updated) - only the session.updated field is simply not updated and displays the original value :slight_smile:

1 Like

Yes the documentation is wrong it’s max_response_output_tokens not max_output_tokens

3 Likes

@todd.fisher @mwoop where are you seeing the transcription come out? If I send the payload:

{
    "type": "session.update",
    "session": {
      "input_audio_transcription":  {"model": "whisper-1"}
    }
}

I see the session get updated:

but don’t see any conversation.item.input_audio_transcript.* events.

I also dont see any input transcription events when i leave enabled out and still have

{“model”: “whisper-1”}

left in my session.update

It doesn’t error anymore but I don’t get any transcriptions.

1 Like

Did any of you solve this? I haven’t seen any input transcriptions.

2 Likes

Transcriptions also not working for us! Any movement on this?

2 Likes

yeah, same here. Audio input transcriptions still aren’t working. I’m using an Integration with Twilio, and I wonder if that’s part of the problem? I’d be interested to know what the use cases are for the people who it’s working for.

Today some magic happened and I got transcription.
I’m using Java to connect to the realtime API.
Also I have a web client with wav_recorder.js from openai example. Before I have a different recorder implementation. I think the goal is to use 24000 sampleRate.
Here is my session.update. I’m not sure, but I think this “turn_detection”:null also important, if you are not using server VAD.

{"type":"session.update","session":{"modalities":["text","audio"],"input_audio_format":"pcm16","instructions":"Make transcription from my speech","turn_detection":null,"input_audio_transcription":{"model":"whisper-1"}}}

This is my response.create

{"response":{"instructions":"Make transcription from my speech","modalities":["text"]},"type":"response.create"}

And then I got this

{"type":"conversation.item.input_audio_transcription.completed","event_id":"event_ALCCySqEmJFHCYcetC5Ct","item_id":"item_ALCCwjSTw6cvGdZmpyQhC","content_index":0,"transcript":"Hello, how are you?\n"}

Well I am also struggling with audio input transcriptions. I don’t use Twilio or any third party integration. I use server VAD, I don’t get “Enabled” field in session.updated JSON response then as someone said I can confirm the documentation seems not up to date (also for max_response_output_tokens by the way).
By chance, I get a response for event “conversation.item.input_audio_transcription.completed” but the transcript result is totally wrong! :sob:

Confirmed here as well. My transcript comes in various languages including Chinese letters as well.

1 Like

Most of the issues when not getting a transcription have to do with the input audio that is being passed.

Write it to a file and listen to it, maybe you’ll spot some errors.

Make sure that the samplerate is 24000 Hz as the API requires this.

Make sure that the audio doesn’t sound distorted, cut out, speed up or down or pitched up/down.

Make sure there is audio in the first place.

Good luck everyone! :hugs:

I tried the solution in this topic, and I make sure my recorded audio file without error, but I have not seen any transcription. Can anyone help?

What language is the speech in the file?
Are you getting any errors from OpenAI? :blush: