Fine-Tuning OpenAI's Real-Time API for Native Speech-to-Speech Audio Generation

Amirkho · November 10, 2024, 9:53am

Hi there! I’m working on developing an application that uses OpenAI’s models for speech-to-speech consultations. I have a question about the Real-time API: is it possible to fine-tune the Real-time API on our own data to generate native audio in a speech-to-speech format, similar to the latest audio features provided by OpenAI? Is this feature available for customization with our own data? Thanks for any insights!

michael_birkeland · November 27, 2024, 9:18am

Same question. Is there some way to use fine-tuned models for the real-time API?

arko7n · November 30, 2024, 11:06pm

+1 – Is fine-tuning the Realtime model currently possible or in the near product roadmap?

Would appreciate someone’s reply on this because my app relies on fine-tuning (text to text/audio), and I can’t migrate to the Realtime API without it.

I mainly want to train the LLM to respond in a certain way (logic, function calling) and I’m not looking for speech-to-speech tuning.

Thanks!

bruno.vaz · December 31, 2024, 5:19pm

Hey!

I’m assuming it’s not possible since the supported models for fine-tuning, according to this OpenAI’s page are

gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
gpt-4-0613
gpt-3.5-turbo-0125
gpt-3.5-turbo-1106
gpt-3.5-turbo-0613

Since none of the gpt-4o-realtime-preview nor the gpt-4o-mini-realtime-preview models, which are the models supported by the realtime API, are on that list, I assume they can’t be fine-tuned.

Topic		Replies	Views
Fine-tuning using GPT-4 Beta API	18	25859	December 12, 2023
Why Isn't It Possible to Finetune ChatGPT-4o? API gpt-4 , fine-tuning , api	7	3295	June 21, 2024
When is fine-tuning available for the gpt-3.5-turbo? API	7	15156	December 13, 2023
Fine-tuning the gpt-4-vision-preview-model API gpt-4 , fine-tuning , gpt-4-vision	8	6460	August 26, 2024
Will we be allowed to fine-tune o1 models in the future? API fine-tuning , o1 , o1-preview	2	1820	October 6, 2024

Fine-Tuning OpenAI's Real-Time API for Native Speech-to-Speech Audio Generation

Related topics