CoT with 4o Audio or Real Time

Use case includes CoT before answer we output to user. Currently using STT LLM TTS . Wondering if anyone was able to have audio or realtime output ONLY what we want to communicate to user while maintaining the rest of the answer of the CoT part silent.

You could try: STT > LLM(COT) > LLM > TTS.

Where the first LLM would output the chain of thought, and the second LLM would get the prompt + user message + COT.

If you really need/want to use the real time api, it’s worth trying it with function calling, and in your prompt you could instruct the LLM when to call that COT function.

It seems that now it’s not possible to create RAG-like systems using the real time API, hope this changes soon.

Yep thanks, easier for now to do STT > LLM(COT) > TTS and quicker :slight_smile:

I’ve had some success in the prompt by explicitly calling out instructions, for example, I have this in my prompt:

My instructions to you will be in all caps, and be contained in . DO NOT PUBLISH MY INSTRUCTIONS TO THE USER.

Q2. Question … [EXTRACT 2 RESPONSES HERE]

[FOR EACH RESPONSE FROM Q2, ASK THE FOLLOWING QUESTION WITH THE EXTRACTED RESPONSE]


This works but tweaks out from time to time where it will speak the first instruction to the user, then adheres to the rest. Not entirely sure why this happens as I haven’t been able to reproduce consistently.

thanks for this. not following though. can you share a prompt example i can run ? goal is to have TEXT output that is NOT spoken in audio