Feature Request - Conversation enumerated state

Hey all,

So we find it hard to determine when the conversation has naturally ended, when the given model needs information from the user, or when it could simply continue. We’ve tired most available text capable models and the results are the same: 4, legacy 3.5, o1-mini…

This is particularly challenging with the realtime voice models, as any requested prefix / postfix will be spoken as well.

We’ve tried various system prompts like:

(For Followups) If you need to ask a question, always include a ? (question mark) at the end. Or prefix your response with FOLLOWUP_NEEDED.

(For Ending Naturally) If the conversation can end naturally, prefix your response with END_NATURALLY.

(For Contuting an open ended conversation) If the conversation can continue naturally, prefix your response with CAN_CONTINUE.

With all these in English, the results are honestly mediocre and not consistent. In another language the results are probably going to be disastrous.

We would really love an enumerated constant returned in any of the response forms (realtime / websocket, normal / rest) that would indicate what state the conversation is in.