Based on the API information shared, it seems like the ChatCompletion for gpt-3.5-turbo is a single pass of messages. Rather than the “interactive” mode that is available via the ChatGPT research portal.
It does seems extremely expensive if for each continuation of “chat” that the system context and previous related messages needs to be sent back.
Anyone have any insights on this? Or is there a way that a chat channel be kept open?