Input_audio_transcription not working in Real-Time — related to g711_ulaw?

I had the same problem as everyone here i.e. enabling whisper-1 model and using g711_ulaw with Twilio did not produce any input_audio_transcript. In the end, I logged everything coming back from OpenAI inside send_to_twilio function

async for openai_message in openai_ws:
        response = json.loads(openai_message)
        print(f"Full event received: {json.dumps(response, indent=2)}")

After looking at the logs, I noticed the user audio transcription is generated after the transcription is completed and NOT after being committed to the input_audio_buffer .

If anyone wants to try, here are the lines code

if response.get('type') == 'conversation.item.input_audio_transcription.completed':
       transcription = response.get('transcript')
       if transcription:
           print(f"User said: {transcription}")
1 Like