Speech to Speech via API vs. waiting for GPT-4o voice

itsvnk · August 9, 2024, 5:33pm

I have a set of trained data, assistants, etc. working brilliantly on Gpt-4o

And, for my usecases, the responses from GPT-4o beat llama3, Google, etc. hollow!

Now, i would like to extend this to speech to speech. I can wait for GPT-4o voice to be made public on the API but this could be a long wait (any ideas anyone?)

While i wait for the above, the only thing I can think of is extend my implementation to do text to speech & vice-versa. But then, of course, i will be hit by latency as currently the text responses are streaming !

Does anyone have any better ideas on this? Or, when is GPT-4o voice likely to be made available publicly?

Thanks!

_AIIS · August 9, 2024, 7:44pm

I’ve seen a few videos where people get some pretty good latency, not sure if good enough for your situation though.

Kesku · August 9, 2024, 10:24pm

Per the Advanced Voice Mode FAQ:

We are planning for all Plus users to have access in the fall. Exact timelines depend on meeting our high safety and reliability bar. We are also working on rolling out the new video and screen sharing capabilities we demoed separately, and will keep you posted on that timeline.

Topic		Replies	Views
GPT-4o text to speech and speech to text API	19	19163	September 30, 2024
GPT-4o New Voice Model, API Release API	21	22301	July 23, 2024
When do you expect official release of Voice model API for gpt4o? API gpt-4 , api	11	7280	August 22, 2024
Voice and audio - gpt-4o - any updates? API api , speech , voice	0	1835	June 7, 2024
GPT-4o Audio Access for API API gpt-4o	28	34155	December 13, 2024

Speech to Speech via API vs. waiting for GPT-4o voice

Related topics