Realtime API text and audio

can someone clarify,

For every call to the realtime api, are we getting charged:

text: input/output

AND

audio: input/output

or just one or the other depending on the modality of the call?

1 Like