I had the same problem as everyone here i.e. enabling whisper-1 model and using g711_ulaw with Twilio did not produce any input_audio_transcript. In the end, I logged everything coming back from OpenAI inside send_to_twilio function
async for openai_message in openai_ws:
response = json.loads(openai_message)
print(f"Full event received: {json.dumps(response, indent=2)}")
After looking at the logs, I noticed the user audio transcription is generated after the transcription is completed and NOT after being committed to the input_audio_buffer .
If anyone wants to try, here are the lines code
if response.get('type') == 'conversation.item.input_audio_transcription.completed':
transcription = response.get('transcript')
if transcription:
print(f"User said: {transcription}")