Realtime API, input audio tokens exploding

I have a usecase, where I keep the session for 30+ mins. I can see audio tokens adding up very quickly.

I understand that apparently the audio buffer keeps building up. Even though I do a forced commit every 10-20s. But apparently since VAD is off, the buffer builds up.

My questions are:

  1. Is my understanding correct, that unless I do input_audio_buffer.clear, the buffer keeps adding up, even though I did a commit, and response.create?

  2. If I do “input_audio_buffer.clear“, would I lose previous context? or as long as its after a commit, the conversation items would still have the needed context?