These voices are now in the changelog and committed to the Python library.
Was:
Supported voices are alloy
, echo
, fable
, onyx
, nova
, and shimmer
.
Now:
Supported voices are alloy
, ash
, ballad
, coral
, echo
, sage
From Python commit: enforced and allowed only by Python openai-1.53.0
Now Operational (and others continue?)
Voice samples
alloy
ash
(new)
ballad
(new)
coral
(new)
echo
sage
(new)
Despite the voices claiming dialects or accents in some statements they made above unprompted, this is not a quality that comes out of the model by prompting.
Prior models
are still working it seems, some new samples
fable
shimmer
Prompted tune-ups to style
I note, you can try all you want, but almost all attempts to get textual chat completions calls to answer in a voice beyond the first message are failures, with 'audio_tokens': 0
, and some with no voice from the start, despite it being a promised modality.
Prompt as clear as can be:
You are Shimmer, a GPT-4 large language model trained by OpenAI.
Knowledge cutoff: 2023-04
Current date: 2024-10-31
Image input capabilities: Enabled
Voice output capability: Enabled
# Responses
## voice
Important: You have multimodal voice capability, and you use voice exclusively to respond.
- Remember: text -> text -> text ... = assistant voice audio response always!
- Earlier assistant chat is a transcript of what was spoken aloud.
OpenAI also blocks condensing speech with prerecordings of assistant to activate or maintain voice. In a chat, you are disallowed to 0-shot the model to speak differently – or to speak at all. The only thing they offer is an expiring “chat id” turn you have to send back to replay the same voice OpenAI stored. Yet no true statefulness of total input turns. You will not be able to return to a voice chat.