Udio: New Music Generator text2audio from Nvidia?

Rumors have been circulating for a few days now, so I’ve decided to put my detective’s hat on once again, for an AI Mystery Investigation. In this video, you’ll not only learn what this new platform is, but we’ll hear some samples from it, which I’ll do some analysis on to see if it really IS a threat to Suno’s AI Music Throne.


It’s not a mystery.

Assuming it is Udio, it’s a product of Uncharted Labs, Inc which is made up of some former Google researchers.


A bit more research from Reddit…



open beta: https://www.udio.com/

edit: i’ve experienced a few minor hiccups with the website and suspect that i’ll see more of that as this starts popping up on people’s rAIdar :smiley:

2nd edit: like i said :laughing:

idk if that qualifies for /softwaregore but yeah they appear to be experiencing some growing pains

3rd edit: ~200k views in ~2.5hrs on their intro post. it may take them a moment to catch their breath:


Thanks for sharing!

Anyone else give it a go yet? Crazy they’re giving away so much for free … they’re gonna grab all the market share soon!

I’ve not had a chance to play around yet.

It is amazing, from my trials. I finished a metal piece that goes for over 4 mimutes by extension and curation. Can we include audio links here?

Here is a link: Udio | Neon Shredscape by Paul Fishwick

1 Like

I tested it. Using lyrics written in language other than English seem to be not so good yet. Other than that, it sounds good. I turned Invictus by William Henley to ska lol.

Just starting to try it, but it is obvious that it won’t result in a top-40 smash. One generation has lyrics that are on the theme of the prompt, another go just sounds like plausible English – if you didn’t know the language. But enough prompting and the lyrics can be make clear and intelligible. There’s a bit of prompt disobedience and mentioning instructions in output, but if using DALL-E, it’s nothing new.

It’s actually pretty amazing, even when it writes its own lyrics based on a theme, and then you refine that prompt to transition the style. The only thing missing is more form to an entire piece than merely “extend”, where you might be able to highlight a chorus and have a return to what’s been written and sung before. Forcing your own lyrics on it might give form to the repetition that makes music appeal.

My masterwork of “pride” (a bit of humor), that takes a while to get there. Seeing the lyrics in the link is kind of a spoiler.

Update at 6hrs in. I was really getting the hang of this and sculpting what I want, until heartbreak at 4:22: “This song is too long to extend”. Right at the core of what was going great. Enjoy:

1 Like

Here’s a creative re-imagination of a tune you might recognize, with another style for it that suits the lyrics being a bit mashed up…

Udio | Midday Reprieve by King Krispus

Tell me that’s not crazy close to a song? And with remarkable text-to-speech.

Still, this technology seems most suited for a producer making loops or “evolution” of a beat instead of going for a whole catchy song (that I let continue with nonsense vocals), because of the 30 seconds of generation at a time that breaks audio into unnatural prompted segments, even of forgotten volume and voices, and inability to return to a chorus or melody. And I expect that like DALL-E or even ChatGPT, the perceptions will be tamed after finding out where everything output is kind of the same.

Anyone knows anything if they have / when they are opening their API?

Do they even hint there would be an API? They do plan to transition to paid-only.

The largest percentage of outputs are discardable, and continuing on the base 33 second through four different “extend” methods needs even more prompt interaction based on the impression you get of where the first seed of an idea might need to go.

The UI they have is incredibly mature and attuned to the particular product features, with few foibles. Like a better looking sharing than OpenAI “store” after half a year.

Something completely amazing like this clip I made from a whisper transcript, to then blow miles past OpenAI TTS, simply takes an hour of work to get right.

1 Like