Voice correspondences between ChatGPT and Realtime API

In the ChatGPT app, with the advance voice mode, there are the following voices available:

  • Arbor - Easygoing and versatile
  • Breeze - Animated and earnest
  • Cove - Composed and direct
  • Ember - Confident and optimistic
  • Juniper - Open and upbeat
  • Maple - Cheerful and candid
  • Sol - Savvy and relaxed
  • Spruce - Calm and affirming
  • Vale - Bright and inquisitive

If I’m not mistaken, the advanced voice mode is built with the Realtime model:

There are 8 voices available for use with the Realtime API, which offers the following voices:

  • alloy
  • echo
  • shimmer
  • ash
  • ballad
  • coral
  • sage
  • verse

My question: Is there a correspondence between these two lists and why are they different?

Thank you in advance for your reply!

The advanced voice mode is definitely not built ontop of the realtime api. The performance of advanced voice mode is far superior.

How do you know that? It makes sense that the AVM is built on top of the realtime API (except that some voices, such as Sol, are not released to the public).

In the same spirit, ChatGPT should also be built on top of the API. If your chat app is not as good as ChatGPT, that doesn’t mean ChatGPT is not built on top of the API.