I’ve noticed that if I manually create conversation items with the realtime api, the api will stop generating audio responses and only respond w/text (despite having modalities set to text and audio).
Thanks. Including both modalities doesn’t seem to help if I manually add a bunch of text conversation items. That post is very relevant though to my problem.
I am using the “gpt-4o-audio” model and seems to have the same issue. After adding conversation with long text, the model doesn’t seem to output any audio object…