The right event handler to get the user transcription is this:
conversation.item.input_audio_transcription.completed
Also, this topic could be relevant: