What are the differences between TTS HD vs TTS? Any frequency difference? Trained differently? I am going to use it in applications! It will be TTS HD only. So I need to know.
I’m not sure if it’s trained differently or if HD is just saved at higher bitrate, etc. I’d imagine the latter?
Twice the price for optimized for quality instead of optimized for speed.
They have the same audio bandwidth and sample rate the last I checked, and since the generations of successive runs are not identical, are hard to compare for quality.
It may be something subtle, like the number of internal dimensions or parameters of AI models. Training quantity or perplexity. Such would account for less “speed”, although that doesn’t seem markedly different either.
Agree with @_j
The docs state:
For real-time applications, the standard
tts-1model provides the lowest latency but at a lower quality than the
tts-1-hdmodel. Due to the way the audio is generated,
tts-1is likely to generate content that has more static in certain situations than
tts-1-hd. In some cases, the audio may not have noticeable differences depending on your listening device and the individual person.
What do general public prefer?
TTS or TTS-HD? or both? Should I charge by 1k character or by monthly? Just wonder what is the best?
I’d offer them both at different prices depending on what they prefer
According to the document description, tts-1 is optimized for speed, while tts-1-hd is optimized for quality. However, in about 30 Japanese text-to-speech tests that I conducted, tts-1-hd often read parts of the Japanese text with a strange pronunciation that was neither Japanese nor English.
Therefore, it is likely that tts-1 and tts-1-HD were trained on different datasets.
I have not confirmed whether this applies to languages other than Japanese, but which one to prefer may vary depending on what language is being used for the text-to-speech.
The cost is indicated as per character, so it is probably not per token.
That will be nice if OpenAI trains better for Japanese.
If you charge monthly, you should make sure to implement some kind of upper usage cap, so users can’t drive you into bankruptcy.
For example, “$X per month, includes up to Y hours of generated audio, after which we charge you $Z per additional hour.”
Also, monthly charges have the benefit that you don’t need to account for pre-payment that “stores value” you have to keep valid for a long amount of time. Also, you will have a more regular revenue stream, with montly subscriptions coming in every month.
It’s clear that determining how to secure profits is a challenge, as can be seen from OpenAI’s own difficulties in balancing the provision of services through ChatGPT and via their API.
Personally, I believe there is a certain rationale in combining a monthly subscription model with a pay-as-you-go system.
By adopting a monthly subscription, you can ensure a steady profit regardless of whether users utilize the service or not.
Additionally, by setting limits on the amount of service provided in a subscription model, you can protect the provider’s profits without detriment.
Should users require more service usage, you can accommodate this by offering an additional pay-as-you-go plan.
As mentioned above, user preferences between tts-1 and tts-1-HD may vary, so it would be advisable to make both options available.
Thanks everyone, It was really helpful!
This is a related issue. Will OpenAI have more voices and more professional voices? For chrome extensions, what is the best way for OpenAI API to call for extensions? How can I hide the OpenAI key and quick API call?