New models appear on the API

I was looking at the models listed on the API and found two new models listed on the /models endpoint:

  • canary-whisper - Latest whisper Yay! (likely V3)
  • canary-tts - not sure what’s added to tts, but latest


Same, then a search came back with the forum. Usually my searches for specialized areas of knowledge lead to my own writings on the topic already…

canary-tts, created Wednesday, November 8, 2023 5:22:15 PM California time

Canary, like a canary in a coalmine, to detect who will play with random models? Canary is also the bleeding-edge Windows 11 version for insiders.

Still takes the same voice specifications:

(enum_values=[<Voice.NOVA: ‘nova’>, <Voice.SHIMMER: ‘shimmer’>, <Voice.ECHO: ‘echo’>, <Voice.ONYX: ‘onyx’>, <Voice.FABLE: ‘fable’>, <Voice.ALLOY: ‘alloy’>])"

I can say that the quality of voice is significantly degraded when using speed 1.2, a choppy effect, so I won’t try this on other models either. It is likely post-processed.

testing generation speeds

tts_canary-tts_alloy_20231113_093208.flac took 2.64 seconds
tts_tts-1_alloy_20231113_093230.flac took 2.51 seconds
tts_tts-1-hd_alloy_20231113_093254.flac took 2.85 seconds

The new billing usage makes it impossible to find the cost.

There are things on canary-tts that are amazing! Like speech modulation.

1 Like

It took another day to see billing, with their poor breakdown. I didn’t log all the inputs as I was also writing a utility library to ease the chunking of document files for max character limits, audio file appending, and the the awkwardness of the python library.


Imagine, set class object parameters once if you don’t like my defaults, then keep feeding methods text for the existing filename if not just calling by input/output file. Now uninspired to continue writing though as it would not be for me.

(the TTS fails on reading back OICU812RUOK) for maximum audio per input)

1 Like

Here’s a tiny sample of speech modulation on canary-tts:

This has been completely made with the API. No post processing.


So what was the bit at the beginning? A sob?

This is a sentence.
These are words in a sequence

Experimented last night and could not get the thing to work.

1 Like

That was a soft laugh. It’s not engineered, the model knows what a soft laugh is IMO.

I’ll be writing about this more, once I have done some more tests.


Did you just prompt [soft laugh] or?

To me it sounds a bit like inhaling?


Yes, soft laugh and then some more. Certainly there’s room for more detail, will be sharing more soon.


Could you please give me the prompt? I have not been able to recreate this.

Here’s a test: check your billing the day after using this canary-tts.

Do you, for not using any untrained completion models like davinci-002, have in your billing usage for the day “base models” with an extraordinary charge?

The rate limits for the model are not under “audio”, but rather are rated as completion with completion token limits - and so are the bills?

Nope nothing like that. It’s billed under its own category:

And I did use base model davinci for an experiment which is reflected here:

Another model is now added to the API: whisper-1-1p