Fine-Tuning OpenAI's Real-Time API for Native Speech-to-Speech Audio Generation

Hi there! I’m working on developing an application that uses OpenAI’s models for speech-to-speech consultations. I have a question about the Real-time API: is it possible to fine-tune the Real-time API on our own data to generate native audio in a speech-to-speech format, similar to the latest audio features provided by OpenAI? Is this feature available for customization with our own data? Thanks for any insights!

2 Likes

Same question. Is there some way to use fine-tuned models for the real-time API?

2 Likes

+1 – Is fine-tuning the Realtime model currently possible or in the near product roadmap?

Would appreciate someone’s reply on this because my app relies on fine-tuning (text to text/audio), and I can’t migrate to the Realtime API without it.

I mainly want to train the LLM to respond in a certain way (logic, function calling) and I’m not looking for speech-to-speech tuning.

Thanks!

1 Like