Feedback on Voice Models Customization

Hiya,

I’m working on a little “actor” app that provides AI voice acting and what-not. It’s fun to work with, but I think the Open AI definitions of voices could include a little more meta data so that a person won’t have to do what I did.

I recommend adding some additional information to each model, such as their “apparent gender,” “vocal_range,” and “responsiveness to direction.” This way, you don’t have to research all of the voices on your own to make a choice.

Here’s how I did it:

model_slug gender vocal_range direction_responsiveness description
alloy female contralto 1 smokey, kinda husky female, alto. Not great with accents.
ash male baritone 1 Kinda scratchy yet upbeat male baritone. Not great with accents.
ballad male tenor_2 2 Male, clear, second tennor, slight british accent out-of-the-box.
coral female soprano_2 4 Female, alto or second soprano, clear, good with accents.
echo male tenor_1 3 First tenor, energetic and warm by default.
fable female alto 3 Alto, slight English / New Zeland accent out-of-the-box.
onyx male base 4 Base / Baritone, a little husky and quite a vocal range.
nova female alto 5 Alto, very responsive to voice direction.
sage female soprano_2 5 Second Soprano, very responsive to voice direction,
shimmer female contralto 4 Alto or contralto, responsive to voice direction, soothing in general.
verse male tenor_2 5 VERY responsive to voice direction.
7 Likes

Very nice idea and report!

There’s two “generations” of voices to employ, where the latter are more adaptive (but the most responsive to prompting seem to still be those assigned to ChatGPT):

"voice": voice,       # Choose voice: STANDARD (fable, onyx, nova, shimmer)
                      # or NEW (alloy, ash, ballad, coral, echo, sage) + (verse)

openai.fm has a sparkles icon on the new AI voices.

Gender can be fluid, and “Bob’s Burgers” or “Big Mouth” shows the context can McGurk the listener. Also merely pulling out of a older script, the “heaviness” can be indexed so you don’t have to think about the names.

# Voice mapping as a class attribute; comment = rank from 1: male to 10: female
_voice_mapping = {
    1: 'nova',     # 10
    2: 'shimmer',  # 7
    3: 'fable',    # 6 (English accent)
    4: 'alloy',    # 7
    5: 'echo',     # 2
    6: 'onyx',     # 1
}
4 Likes

Wait a second… With this idea the voice models will be able to sing? :0 I mean these voices from OpenAIFM.

Lol no… These are just descriptions on the voice tone and pitch, and adaptability to instructions.

1 Like

glitches herself Sometimes, when I read something about vocal range, I tend to take it literally, even if that’s not the intention. The reason for this is probably my ESL background.

2 Likes

It is alright. They do sing if you play enough with instructions, but it is not guaranteed and very unstable as they are not made for this purpose.

2 Likes

Wait a minute… @_j thinks this is a good idea!? :face_holding_back_tears: :exploding_head:

Thanks for pointing out that the difference between the new ones and the standard. I’ll add those to the list somehow… you can definitely tell if they’re the “standard” because they don’t react to the new instructions at all. Maybe that reactivity scale is just the way to go about it?

I also have a "androgynous’ option for the voices, a few are certainly so. I just went with male/female for my own purposes, and was considering changing a few to that designation.

@Spitterworld no, they can’t exactly sing yet—though that’s probably by design. I actually got the idea from Frank Herbert’s “Dune.” He describes the speaking voices according to their vocal range. Very “novel” of him.

@aprendendo.next How did you get them to sing? Do you have a recording you can share?

1 Like

They can “sing” simple things like a lullaby if you post lyrics and write instructions, don’t have any right now but it’s not difficult. I guess they work because it is closer to storytelling than “music”.
You can start with a voice profile on OpenAI.fm, the newer voices (with a diamond mark in the buttons) are more steerable.

But don’t expect it to work for anything more sophisticated like a rock or pop song, that would be too much.

2 Likes

@_j Do you know if there is a character limit on “instructions”?