Realtime API: session update doesn't change input audio format

mwoop · October 4, 2024, 11:40am

Guys,

I am updating the session and trying to change instructions, input_audio_format and output_audio_format (setting both to ‘g711_alaw’).

Server is sending a session.updated event with all values correctly updated except:
…
input_audio_format: ‘pcm16’,
…

Any ideas? Is this a bug?

mwoop · October 4, 2024, 12:12pm

I also noticed that I cannot set input_audio_transcription property “enabled” to true:

ERROR: Unknown parameter: ‘session.input_audio_transcription.enabled’.

ultimus · October 4, 2024, 5:14pm

I ran into the same thing. I believe this must be a bug because it is contrary to the official documentation at https://platform.openai.com/docs/api-reference/realtime-client-events/session-update

I tried removing enabled in which case it doesn’t error but there is still no transcription occurring.

I think we might need help from someone on the OpenAI team.

narayana · October 4, 2024, 7:12pm

I have the same issue, g711 does not work and transcription does not work

todd.fisher · October 4, 2024, 8:49pm

I do notice if you leave enabled: true out then it does start giving you a transcription

mwoop · October 5, 2024, 5:41am

You are right - it looks like they removed the attribute ‘enabled’.

radukalexey · October 5, 2024, 6:59am

also the same problem with:

"max_output_tokens"

and temperature can’t be lower 0.6

mwoop · October 7, 2024, 8:00am

A little update: after testing around with codecs, it now looks like I am able to submit g711_alaw audio although the session.updated field is still pcm16. It looks like session.update works as expected (the codec is updated) - only the session.updated field is simply not updated and displays the original value

Echoshard · October 8, 2024, 1:42am

Yes the documentation is wrong it’s max_response_output_tokens not max_output_tokens

mdagost · October 9, 2024, 8:22pm

@todd.fisher @mwoop where are you seeing the transcription come out? If I send the payload:

{
    "type": "session.update",
    "session": {
      "input_audio_transcription":  {"model": "whisper-1"}
    }
}

I see the session get updated:

but don’t see any conversation.item.input_audio_transcript.* events.

Eigenspan · October 9, 2024, 9:32pm

I also dont see any input transcription events when i leave enabled out and still have

{“model”: “whisper-1”}

left in my session.update

It doesn’t error anymore but I don’t get any transcriptions.

zlandes · October 18, 2024, 7:36pm

Did any of you solve this? I haven’t seen any input transcriptions.

philippeWander · October 22, 2024, 5:20am

Transcriptions also not working for us! Any movement on this?

kevin.g.stjohn · October 22, 2024, 3:15pm

yeah, same here. Audio input transcriptions still aren’t working. I’m using an Integration with Twilio, and I wonder if that’s part of the problem? I’d be interested to know what the use cases are for the people who it’s working for.

al13 · October 22, 2024, 4:38pm

Today some magic happened and I got transcription.
I’m using Java to connect to the realtime API.
Also I have a web client with wav_recorder.js from openai example. Before I have a different recorder implementation. I think the goal is to use 24000 sampleRate.
Here is my session.update. I’m not sure, but I think this “turn_detection”:null also important, if you are not using server VAD.

{"type":"session.update","session":{"modalities":["text","audio"],"input_audio_format":"pcm16","instructions":"Make transcription from my speech","turn_detection":null,"input_audio_transcription":{"model":"whisper-1"}}}

This is my response.create

{"response":{"instructions":"Make transcription from my speech","modalities":["text"]},"type":"response.create"}

And then I got this

{"type":"conversation.item.input_audio_transcription.completed","event_id":"event_ALCCySqEmJFHCYcetC5Ct","item_id":"item_ALCCwjSTw6cvGdZmpyQhC","content_index":0,"transcript":"Hello, how are you?\n"}

littlegrasshopper · October 22, 2024, 8:50pm

Well I am also struggling with audio input transcriptions. I don’t use Twilio or any third party integration. I use server VAD, I don’t get “Enabled” field in session.updated JSON response then as someone said I can confirm the documentation seems not up to date (also for max_response_output_tokens by the way).
By chance, I get a response for event “conversation.item.input_audio_transcription.completed” but the transcript result is totally wrong!

mwoop · October 22, 2024, 9:06pm

Confirmed here as well. My transcript comes in various languages including Chinese letters as well.

j.wischnat · October 28, 2024, 6:59am

Most of the issues when not getting a transcription have to do with the input audio that is being passed.

Write it to a file and listen to it, maybe you’ll spot some errors.

Make sure that the samplerate is 24000 Hz as the API requires this.

Make sure that the audio doesn’t sound distorted, cut out, speed up or down or pitched up/down.

Make sure there is audio in the first place.

Good luck everyone!

fylin931 · October 29, 2024, 3:43am

I tried the solution in this topic, and I make sure my recorded audio file without error, but I have not seen any transcription. Can anyone help?

j.wischnat · October 31, 2024, 10:41am

What language is the speech in the file?
Are you getting any errors from OpenAI?

Topic		Replies	Views
Input_audio_transcription not working in Real-Time — related to g711_ulaw? Bugs realtime	7	1126	December 26, 2024
Retrieving user response from Realtime Voice WebRTC API api	14	528	January 11, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	9	1776	February 28, 2025
RealTime API Transcription errors Bugs realtime	7	1206	January 9, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1105	February 10, 2025

Realtime API: session update doesn't change input audio format

Related topics