Batching with an array of strings on TTS models

Hi,

I would like to do TTS for a large amount of my data; however, my rate limits (RPM) for TTS models are low.

The TPM, however, is adequate for the purpose. Upon trying to use my TPM with batching by passing an array of strings as input, I get the following error:

Exception has occurred: BadRequestError
Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n  str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/audio/speech'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

  File "/Users/sukhmanjawa/Projects/py-tes/ssml.py", line 9, in <module>
    response = client.audio.speech.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
openai.BadRequestError: Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> input\n  str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}

Is batching supported on the TTS endpoint?

Does not look like it :thinking: I have a contact who knows the speech and vision people, I’ll see if they can find out.

1 Like

The API reference says input is only a string.

input (string) - Required
The text to generate audio for. The maximum length is 4096 characters.

The error you show is sending something other than a string in that position (an array, which only works on “completion”)

Rate limits for tts-1 are 100-500 per minute, certainly suitable for parallel processing.

tts-1-hd - at 3 requests per minute across tiers, is the only one where you wouldn’t have the ability to run lots of parallel jobs and it would be more suited to one at a time with a wait for the next minute after those three.

I don’t have those rate limits, which is the reason I posted here.

Clarify: Are you somehow free of rate limits, or overly-restricted?

tts-1
Free: 3 RPM 200 RPD
Tier 1,2: 50 RPM
Tier 3,4: 100 RPM
Tier 5: 500 RPM

Do you just need techniques to make parallel async or threaded calls?


You can simulate a “batch”, joining your text up to the character limit, using silence detection after reception, and prompting for a pause longer than would appear in anything being read.

example:

prompt = """
. The error you show is sending something other than a
string in that position (an array, which only works on “completion”).
[pause for 5 seconds] [another pause]
Rate limits for tts-1 are 100-500 per minute, 
certainly suitable for parallel processing.
""".strip().replace('\n', " ")

Resulting audio (good for 24 hours)

My rate limits for tts-1 is 3 RPM.

Initially I also thought of the silence detection for batching but there are points in the audio (due to the nature of audio I’m generating), where the model generates pauses on its own - which could cause problems with the audio segmentation.

Another is obviously the cost; inserting pauses as you suggest wouldn’t be economic, given the current pricing.

My use case is generating small but numerous segments of audio and such pauses will add to cost.