Hi there! I’m working on developing an application that uses OpenAI’s models for speech-to-speech consultations. I have a question about the Real-time API: is it possible to fine-tune the Real-time API on our own data to generate native audio in a speech-to-speech format, similar to the latest audio features provided by OpenAI? Is this feature available for customization with our own data? Thanks for any insights!
Same question. Is there some way to use fine-tuned models for the real-time API?
+1 – Is fine-tuning the Realtime model currently possible or in the near product roadmap?
Would appreciate someone’s reply on this because my app relies on fine-tuning (text to text/audio), and I can’t migrate to the Realtime API without it.
I mainly want to train the LLM to respond in a certain way (logic, function calling) and I’m not looking for speech-to-speech tuning.
Thanks!
Hey!
I’m assuming it’s not possible since the supported models for fine-tuning, according to this OpenAI’s page are
gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
gpt-4-0613
gpt-3.5-turbo-0125
gpt-3.5-turbo-1106
gpt-3.5-turbo-0613
Since none of the gpt-4o-realtime-preview
nor the gpt-4o-mini-realtime-preview
models, which are the models supported by the realtime API, are on that list, I assume they can’t be fine-tuned.