Can I use audio transcriptions in prompt for LLM context for calling the appropriate function in Realtime API's function calling?

hasaan_n · November 26, 2025, 7:51am

Hello.

I am making a conversational bot with a dedicated call flow which only relies on the ‘text’ modality i.e. using function calling only from OpenAI’s Realtime API in order to minimize cost.

Each step in the call flow has a dedicated function which plays prerecorded audios to the user which are interruptable.
The user audio flows through my Asterisk ARI application and then goes to OpenAI’s websocket. Audio is transcribed and triggers the appropriate function which plays the relevant recording.

The problem I’m having currently is that the LLM has no context of the conversation so how can it call the appropriate function when interrupted?
It does work but due to the nature of the setup, it requires standalone context to trigger the function e.g. “Which hospitals are applicable in this insurance?” works but “Which hospitals?” doesn’t. I get the reason why as the LLM has no context of the conversation apart from user audio.

Is there any way of inserting context to the LLM of the audio functions being used, so the LLM has context to go on.

Maybe through via the prompt? Just a thought.

tleyden · November 26, 2025, 8:10am

Not sure I fully understand your use case, when you say “only text mode” and OpenAI Realtime API, it seems contradictory since my understanding is that the Realtime API is primarily a speech API.

Having said that, to control the context seen by the Realtime API, have you looked at conversation.item.create?:

https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create

That should be able to give you fine-grained control over the context seen by the model.

Another approach that might work would be to send session.update messages and modifying the instructions, but I think the conversation.item.create approach would be more standard.

BTW I’m assuming you’re using WebRTC? It would help if you could provide the broad strokes of how you are interacting with the OpenAI Realtime API, and what other OpenAI APIs you’re interacting with as well.

hasaan_n · November 26, 2025, 9:32am

I’m sorry, I should’ve been more transparent about my use case.

Text only mode means using only the text modality, as Realtime API has the option for text as well as audio as you provide both normally.

But in my particular case, I’m only using text. The user audio flows through my Asterisk ARI applicaiton and then goes to OpenAI’s websocket. Audio is transcribed and triggers function which plays the relevant recording.

Hopefully, I was clear. I apologize for any confusion, English is not my first language.

tleyden · November 26, 2025, 9:51am

Ok my mistake, sorry I have only used the Realtime API for voice-to-voice mode. I see that it supports text as well.

hasaan_n · November 26, 2025, 10:01am

No worries, I wasn’t descriptive enough. I have updated my explanation.

Weihan · November 26, 2025, 11:22am

hasaan_n:

Hello.

I am making a conversational bot with a dedicated call flow which only relies on the ‘text’ modality i.e. using function calling only from OpenAI’s Realtime API in order to minimize cost.

Each step in the call flow has a dedicated function which plays prerecorded audios to the user which are interruptable.
The user audio flows through my Asterisk ARI application and then goes to OpenAI’s websocket. Audio is transcribed and triggers the appropriate function which plays the relevant recording.

The problem I’m having currently is that the LLM has no context of the conversation so how can it call the appropriate function when interrupted?
It does work but due to the nature of the setup, it requires standalone context to trigger the function e.g. “Which hospitals are applicable in this insurance?” works but “Which hospitals?” doesn’t. I get the reason why as the LLM has no context of the conversation apart from user audio.

Is there any way of inserting context to the LLM of the audio functions being used, so the LLM has context to go on.

Maybe through via the prompt? Just a thought.

The Realtime API is stateless, so the model can’t know what “Which hospitals?” refers to unless you give it context on every turn. You need to store the conversation state yourself (current step, last question, expected intents, etc.) and send that as part of the system message or a small state object in each request.

hasaan_n · November 26, 2025, 11:41am

Thank you for your response.

By system message, you are referring to the System Prompt right? In the instructions parameter we use?

Can you kindly explain any method I can use to store the conversation state?

Topic		Replies	Views
CLOSED Separate ChatCompletion API calls for 'system' and 'user' API	19	3787	September 20, 2023
Do I need to send prompt every time for a same task? API	10	15929	February 15, 2024
Best way to pass "state data" into Assistants API? API assistants-api , api-threads	1	327	December 29, 2024
How can I pass additional context to the realtime API during a conversation API realtime , api-realtime , api-realtime-speech	8	1587	February 19, 2025
Conversation api with changing system prompt API conversation	1	451	September 7, 2025

Can I use audio transcriptions in prompt for LLM context for calling the appropriate function in Realtime API's function calling?

Related topics