I’ve noticed that if I manually create conversation items with the realtime api, the api will stop generating audio responses and only respond w/text (despite having modalities set to text and audio).
Thanks. Including both modalities doesn’t seem to help if I manually add a bunch of text conversation items. That post is very relevant though to my problem.