Does ChaGPT advanced voice mode directly use my audio data instead of transcripting audio to text first?

Can ChatGPT advanced voice mode recognize my tone and emotions by directly ‘listening to my audio’?
I know that they can output audio directly without tts.