Huge tokens charging for text input using Realtime API while only AUDIO input!

Hello dears!

Could anyone provide me the logic of charging 11800 input text tokens while just said “Hello” from my voice to my microphone?