Batching with an array of strings on TTS models

sps · November 26, 2023, 11:45am

Hi,

I would like to do TTS for a large amount of my data; however, my rate limits (RPM) for TTS models are low.

The TPM, however, is adequate for the purpose. Upon trying to use my TPM with batching by passing an array of strings as input, I get the following error:

Exception has occurred: BadRequestError
Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n  str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/audio/speech'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

  File "/Users/sukhmanjawa/Projects/py-tes/ssml.py", line 9, in <module>
    response = client.audio.speech.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
openai.BadRequestError: Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n  str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}

Is batching supported on the TTS endpoint?

Foxalabs · November 26, 2023, 11:58am

Does not look like it I have a contact who knows the speech and vision people, I’ll see if they can find out.

_j · November 26, 2023, 6:02pm

The API reference says input is only a string.

input (string) - Required
The text to generate audio for. The maximum length is 4096 characters.

The error you show is sending something other than a string in that position (an array, which only works on “completion”)

Rate limits for tts-1 are 100-500 per minute, certainly suitable for parallel processing.

tts-1-hd - at 3 requests per minute across tiers, is the only one where you wouldn’t have the ability to run lots of parallel jobs and it would be more suited to one at a time with a wait for the next minute after those three.

sps · November 27, 2023, 7:47am

I don’t have those rate limits, which is the reason I posted here.

_j · November 27, 2023, 8:35am

Clarify: Are you somehow free of rate limits, or overly-restricted?

tts-1
Free: 3 RPM 200 RPD
Tier 1,2: 50 RPM
Tier 3,4: 100 RPM
Tier 5: 500 RPM

Do you just need techniques to make parallel async or threaded calls?

You can simulate a “batch”, joining your text up to the character limit, using silence detection after reception, and prompting for a pause longer than would appear in anything being read.

example:

prompt = """
. The error you show is sending something other than a
string in that position (an array, which only works on “completion”).
[pause for 5 seconds] [another pause]
Rate limits for tts-1 are 100-500 per minute, 
certainly suitable for parallel processing.
""".strip().replace('\n', " ")

Resulting audio (good for 24 hours)

sps · November 27, 2023, 10:14am

My rate limits for tts-1 is 3 RPM.

Initially I also thought of the silence detection for batching but there are points in the audio (due to the nature of audio I’m generating), where the model generates pauses on its own - which could cause problems with the audio segmentation.

Another is obviously the cost; inserting pauses as you suggest wouldn’t be economic, given the current pricing.

My use case is generating small but numerous segments of audio and such pauses will add to cost.

Topic		Replies	Views
Async support with new TTS API? API	4	901	May 26, 2024
Is it possible to use the batch API for audio generation with gpt-4o-audio-preview? API	0	45	December 29, 2024
Batching prompts still being recommended despite not Documentation api	5	1994	January 23, 2024
TTS with more than 4096 characters API tts	4	3999	January 28, 2025
Batching with ChatCompletion Endpoint Documentation gpt-35-turbo , chat-completion , batching , rate-limit	11	33047	December 13, 2023

Batching with an array of strings on TTS models

Related topics