I have noticed curious cases where agent was clearly responding to background dialog (i.e. a podcast interview), but that dialog did not make it to the transcription update returned by server.
It is like AI model has access to unfiltered microphone input and transcription returned is passing additional filter or something. Also it is most definitely reacting to emotion and loudness in voice.
Did somebody else notice something similar? I think a transcription returned to client should be more detailed and include all this details.