I’m working on a little “actor” app that provides AI voice acting and what-not. It’s fun to work with, but I think the Open AI definitions of voices could include a little more meta data so that a person won’t have to do what I did.
I recommend adding some additional information to each model, such as their “apparent gender,” “vocal_range,” and “responsiveness to direction.” This way, you don’t have to research all of the voices on your own to make a choice.
Here’s how I did it:
model_slug
gender
vocal_range
direction_responsiveness
description
alloy
female
contralto
1
smokey, kinda husky female, alto. Not great with accents.
ash
male
baritone
1
Kinda scratchy yet upbeat male baritone. Not great with accents.
ballad
male
tenor_2
2
Male, clear, second tennor, slight british accent out-of-the-box.
coral
female
soprano_2
4
Female, alto or second soprano, clear, good with accents.
echo
male
tenor_1
3
First tenor, energetic and warm by default.
fable
female
alto
3
Alto, slight English / New Zeland accent out-of-the-box.
onyx
male
base
4
Base / Baritone, a little husky and quite a vocal range.
nova
female
alto
5
Alto, very responsive to voice direction.
sage
female
soprano_2
5
Second Soprano, very responsive to voice direction,
shimmer
female
contralto
4
Alto or contralto, responsive to voice direction, soothing in general.
There’s two “generations” of voices to employ, where the latter are more adaptive (but the most responsive to prompting seem to still be those assigned to ChatGPT):
"voice": voice, # Choose voice: STANDARD (fable, onyx, nova, shimmer)
# or NEW (alloy, ash, ballad, coral, echo, sage) + (verse)
openai.fm has a sparkles icon on the new AI voices.
Gender can be fluid, and “Bob’s Burgers” or “Big Mouth” shows the context can McGurk the listener. Also merely pulling out of a older script, the “heaviness” can be indexed so you don’t have to think about the names.
# Voice mapping as a class attribute; comment = rank from 1: male to 10: female
_voice_mapping = {
1: 'nova', # 10
2: 'shimmer', # 7
3: 'fable', # 6 (English accent)
4: 'alloy', # 7
5: 'echo', # 2
6: 'onyx', # 1
}
glitches herself Sometimes, when I read something about vocal range, I tend to take it literally, even if that’s not the intention. The reason for this is probably my ESL background.
Thanks for pointing out that the difference between the new ones and the standard. I’ll add those to the list somehow… you can definitely tell if they’re the “standard” because they don’t react to the new instructions at all. Maybe that reactivity scale is just the way to go about it?
I also have a "androgynous’ option for the voices, a few are certainly so. I just went with male/female for my own purposes, and was considering changing a few to that designation.
@Spitterworld no, they can’t exactly sing yet—though that’s probably by design. I actually got the idea from Frank Herbert’s “Dune.” He describes the speaking voices according to their vocal range. Very “novel” of him.
@aprendendo.next How did you get them to sing? Do you have a recording you can share?
They can “sing” simple things like a lullaby if you post lyrics and write instructions, don’t have any right now but it’s not difficult. I guess they work because it is closer to storytelling than “music”.
You can start with a voice profile on OpenAI.fm, the newer voices (with a diamond mark in the buttons) are more steerable.
But don’t expect it to work for anything more sophisticated like a rock or pop song, that would be too much.