Is there any way to use realtime audio API and we can set a bahvioural prompt and configure the output to JSON?

Can we use any OpenAI real-time audio API to use real-time audio with more customised output, like in a JSON format, where we can set how the JSON should be? Also, can we able to set the behavioural prompt to the model also?
My idea is to use speech → model gets the audio in real-time → process the output into a JSON → then the app converts the JSON → actionable commands.

Any insights, guidelines or solutions will be helpful :slight_smile: