Hi I am using Assistant API with specific instruction at the moment I have to to use voice to text, assistant API and then TTS to have the conversation right now.
I know I can use Realtime API to have Native speech-to-speech: Skipping an intermediate text format means low latency and nuanced output.
but I need to use my own instructions on this still.