Speech to Speech via API vs. waiting for GPT-4o voice

I have a set of trained data, assistants, etc. working brilliantly on Gpt-4o

And, for my usecases, the responses from GPT-4o beat llama3, Google, etc. hollow!

Now, i would like to extend this to speech to speech. I can wait for GPT-4o voice to be made public on the API but this could be a long wait (any ideas anyone?)

While i wait for the above, the only thing I can think of is extend my implementation to do text to speech & vice-versa. But then, of course, i will be hit by latency as currently the text responses are streaming !

Does anyone have any better ideas on this? Or, when is GPT-4o voice likely to be made available publicly?

Thanks!

I’ve seen a few videos where people get some pretty good latency, not sure if good enough for your situation though.

Per the Advanced Voice Mode FAQ:

We are planning for all Plus users to have access in the fall. Exact timelines depend on meeting our high safety and reliability bar. We are also working on rolling out the new video and screen sharing capabilities we demoed separately, and will keep you posted on that timeline.

1 Like